DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic...

6
DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting Tuberculosis (TB) is still a major cause of morbidity and mortality. It is clear that control requires more than simple availability of antibiotics. In order to gain insight into the dis- ease, DNA fingerprinting has been applied to the study of bacterial population struc- ture. This technology has been used to quantitate various components of the disease in a high-incidence community, viz. recent transmission (RT) and reactivation (RA) and to monitor these over time as a tool to quantitate changes in the epidemic. In our high- incidence community, we find unexpectedly high strain diversity, lower than predicted RT, and that reactivation disease dominates. This technology can be used to examine and challenge traditional dogmas. Quantitative measure of RT varies over time, using a two-year sliding window for estimation as a useful period. The results show that the “epidemic” consists of subepidemics characterized by strain families that wax and wane in the community of TB patients. The technology is shown to be a useful and quantitative tool to assess disease status and can therefore be used to monitor inter- vention strategies and refine and monitor results of new control measures. Keywords: Tuberculosis / Molecular epidemiology / DNA fingerprinting / Disease dynamics EL 3454 Robin Warren 1 Madalene Richardson 1 Gian van der Spuy 1 Thomas Victor 1 Samantha Sampson 1 Nulda Beyers 2 Paul van Helden 1 1 MRC Centre for Molecular and Cellular Biology, Department of Medical Biochemistry 2 Department of Pediatrics and Child Health, University of Stellenbosch, Tygerberg, South Africa 1 Introduction Despite the introduction of antibiotics and control pro- grams, tuberculosis (TB) continues to have a major impact on global health. As suggested by Ren Dubos [1], antibiotics may not be the most effective way to deal with pathogens, but the answers may lie in understanding the ecology of the bacterium and its host and the interac- tion between them. An example of this may be smallpox, where a good understanding of the biology of the organ- ism and the behavior of the host made eradication possi- ble with vaccination. Despite decades of research and the use of antibiotics, TB remains a threat to human morbidity and mortality, with an estimated 1/3 of the global popula- tion infected with this organism. South Africa has possibly some of the highest incidence areas in the world, and as in many other communities the disease incidence remains high. The failure to effectively combat infectious diseases such as TB can clearly be ascribed to a number of factors, one of which may be the failure to understand the dynamics of this disease adequately, followed by a failure to implement adequate control strategies [2, 3]. Epidemiology has been used for decades to study the dynamics of many infectious diseases. The insight gained has contributed extensively towards the control of many diseases [2], despite being based on a number of assumptions. One of the aims of epidemiology is to iden- tify sources and index cases from which disease spreads. It is therefore assumed that if it can be shown that a per- son has had contact with a diseased individual and sub- sequently develops the same disease, a transmission event occurred [4, 5, 7, 10, 14]. This is almost certainly a valid assumption in a low incidence environment, but becomes an intractable problem and may lead to errors in a high-incidence community, where multiple contacts are possible. The unraveling of these events by field work becomes impossible and other methods are therefore required. An example of this is tuberculosis. We have used DNA fingerprinting in order to attempt to gain understanding of this disease in our community. It is possible to fingerprint isolates of this organism using a number of probes, the most commonly used of which is a tranposable element, IS6110 [4–6]. Other probes may be used, such as those based on oligonucleotide fingerprint- ing [7]. Fingerprints have suggested that each TB patient is infected with one clonal isolate of Mycobacterium tuber- culosis only. This makes it possible to trace the transmis- sion of individual isolates through a community [5–7] and use the data to examine the dynamics of the disease [5, 8]. Correspondence: Dr. P. van Helden, Department of Pediatrics and Child Health, University of Stellenbosch, PO Box 19063, Ty- gerberg, 7505, South Africa E-mail: [email protected] Fax: +27-(0)21-938-9476 Abbreviations: TB, tuberculosis; UPGMA, unweighted pair group method using arithmetic averages Electrophoresis 1999, 20, 1807–1812 1807 WILEY-VCH Verlag GmbH, 69451 Weinheim, 1999 0173-0835/99/0808-1807 $17.50+.50/0 Nucleic acids

Transcript of DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic...

Page 1: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

DNA fingerprinting and molecular epidemiology oftuberculosis: Use and interpretation in an epidemicsetting

Tuberculosis (TB) is still a major cause of morbidity and mortality. It is clear that controlrequires more than simple availability of antibiotics. In order to gain insight into the dis-ease, DNA fingerprinting has been applied to the study of bacterial population struc-ture. This technology has been used to quantitate various components of the diseasein a high-incidence community, viz. recent transmission (RT) and reactivation (RA) andto monitor these over time as a tool to quantitate changes in the epidemic. In our high-incidence community, we find unexpectedly high strain diversity, lower than predictedRT, and that reactivation disease dominates. This technology can be used to examineand challenge traditional dogmas. Quantitative measure of RT varies over time, usinga two-year sliding window for estimation as a useful period. The results show that theªepidemicº consists of subepidemics characterized by strain families that wax andwane in the community of TB patients. The technology is shown to be a useful andquantitative tool to assess disease status and can therefore be used to monitor inter-vention strategies and refine and monitor results of new control measures.

Keywords: Tuberculosis / Molecular epidemiology / DNA fingerprinting / Disease dynamics

EL 3454

Robin Warren1

Madalene Richardson1

Gian van der Spuy1

Thomas Victor1

Samantha Sampson1

Nulda Beyers2

Paul van Helden1

1MRC Centre for Molecularand Cellular Biology,Department of MedicalBiochemistry

2Department of Pediatricsand Child Health, Universityof Stellenbosch, Tygerberg,South Africa

1 Introduction

Despite the introduction of antibiotics and control pro-grams, tuberculosis (TB) continues to have a majorimpact on global health. As suggested by Ren� Dubos[1], antibiotics may not be the most effective way to dealwith pathogens, but the answers may lie in understandingthe ecology of the bacterium and its host and the interac-tion between them. An example of this may be smallpox,where a good understanding of the biology of the organ-ism and the behavior of the host made eradication possi-ble with vaccination. Despite decades of research and theuse of antibiotics, TB remains a threat to human morbidityand mortality, with an estimated 1/3 of the global popula-tion infected with this organism. South Africa has possiblysome of the highest incidence areas in the world, and asin many other communities the disease incidenceremains high. The failure to effectively combat infectiousdiseases such as TB can clearly be ascribed to a numberof factors, one of which may be the failure to understandthe dynamics of this disease adequately, followed by afailure to implement adequate control strategies [2, 3].

Epidemiology has been used for decades to study thedynamics of many infectious diseases. The insight gainedhas contributed extensively towards the control of manydiseases [2], despite being based on a number ofassumptions. One of the aims of epidemiology is to iden-tify sources and index cases from which disease spreads.It is therefore assumed that if it can be shown that a per-son has had contact with a diseased individual and sub-sequently develops the same disease, a transmissionevent occurred [4, 5, 7, 10, 14]. This is almost certainly avalid assumption in a low incidence environment, butbecomes an intractable problem and may lead to errors ina high-incidence community, where multiple contacts arepossible. The unraveling of these events by field workbecomes impossible and other methods are thereforerequired. An example of this is tuberculosis.

We have used DNA fingerprinting in order to attempt togain understanding of this disease in our community. It ispossible to fingerprint isolates of this organism using anumber of probes, the most commonly used of which is atranposable element, IS6110 [4±6]. Other probes may beused, such as those based on oligonucleotide fingerprint-ing [7]. Fingerprints have suggested that each TB patientis infected with one clonal isolate of Mycobacterium tuber-culosis only. This makes it possible to trace the transmis-sion of individual isolates through a community [5±7] anduse the data to examine the dynamics of the disease[5, 8].

Correspondence: Dr. P. van Helden, Department of Pediatricsand Child Health, University of Stellenbosch, PO Box 19063, Ty-gerberg, 7505, South AfricaE-mail: [email protected]: +27-(0)21-938-9476

Abbreviations: TB, tuberculosis; UPGMA, unweighted pairgroup method using arithmetic averages

Electrophoresis 1999, 20, 1807±1812 1807

� WILEY-VCH Verlag GmbH, 69451 Weinheim, 1999 0173-0835/99/0808-1807 $17.50+.50/0

Nu

clei

cac

ids

Page 2: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

2 Materials and methods

Two neighboring suburban communities of the greatermetropolitan area of Cape Town, South Africa, containingapproximately 35 000 individuals in total, concentratedinto 2.4 km2 were used for this study [8]. M. tuberculosisisolates were obtained from sputum samples frompatients presenting with pulmonary tuberculosis and froma small number of childhood cases, using gastric aspi-rates. These samples were collected over a period of 69months, from early 1993 to June 1998. Sputum sampleswere liquified and plated on LJ slants in a routine labora-tory (South African Institute for Medical Research).Organisms were rendered noninfectious by heat, recov-ered from the slant by scraping, and DNA was isolated asdescribed previously [8]. DNA from the isolates was puri-fied and subjected to Southern blotting, using two probes,viz. transposon sequence IS6110 with PvuII digestion [4±8] and a TB genome sequence derived from a (GTG)5hybridizing fragment after HinfI digestion [7]. Each laneon the gel (and blot) was also run with an internal marker(marker X; Boehringer Mannheim, Germany) and eachgel was run with two external markers of DNA from M.tuberculosis strain Mt 14323 in order to minimize error[13]. The resultant blots were hybridized and scrubbed tosequentially expose autoradiographs for the marker andthe probes in order to enhance the accuracy of analysis.The autorads were scanned and analyzed with GelCom-par Version 4.1 software, using UPGMA and Dice coeffi-cients. A ªclusterº of strains was identified when one iso-late from a patient was identified as being 100% identicalwith an isolate from a separate patient (tolerance of 0.7%

allowed, Rf per band position). A family of strains wasidentified when a grouping of isolates with greater than65% identity was seen on the dendogram (also visuallyconfirmed). From the pattern-matching, an estimate ofclustering was done, using two separate formulas. Totalclustering was done using the formula T(c)/T(a), whereT(c) is the total number of strains within clusters and T(a)represents the total number of isolates. Recent transmis-sion was estimated using the formula T(c) ± N(c)/T(a),where N(c) equals the number of clusters [5]. An estima-tion of laboratory error was done by comparisons of serialisolates (within two months after initial isolate) frompatients. The percentage error was calculated as follows:

number of isolate pairs which failed to matchtotal number of serial isolates analyzed

3 Results and discussion

A community known for its high incidence of tuberculosiswas chosen [8]. This community was selected since it issurrounded on three sides by industrial areas or highwaysand is continuous with other suburban areas on one sideonly. There are two TB clinics serving the area and gooddata are available on patients. The database consisted ofsingle primary isolates from 881 patients, taken during a68-month period from 1993 to 1998. An example of theanalysis is shown in Fig. 1 (50 isolates, representing aportion of the database).

We estimate a sample recovery rate of approximately70% from adult pulmonary cases over this period, with acumulative error rate of 3.4% (17 serial isolates failed to

1808 R. Warren et al. Electrophoresis 1999, 20, 1807±1812

Figure 1. RFLP analysis ofM. tuberculosis isolates accord-ing to IS6110 fingerprinting. Fig-ure illustrates two families ofstrains, with a family similarity ofless than 40%. Within each fam-ily, there are clusters of identicalstrains (100% similarity), uniquestrains within the family context,and approximately 80% similar-ity within each family of all iso-lates.

Page 3: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

match from 499 serial isolates). The use of externalstandards per gel and an internal standard for each lane,as well as increasing the migration distance by 50% dur-ing electrophoresis, allowed us to enhance accuracy overthe standard procedures described for M. tuberculosis fin-gerprinting [6]. For example, using the standard IS6110procedure, it was not possible to resolve a large clusterof isolates with four insertion elements. However, onextended electrophoresis, and using internal standards,we were able to resolve this large cluster into two approxi-mately equal sized clusters. This subclustering was sub-sequently confirmed by the use of other probes (resultsnot shown). Overall resolution was also substantiallyimproved, as resolution between two DNA fragments ofsimilar Rf is enhanced.

Using the IS6110 element as a probe, it is possible toestablish the number and size of many clusters (seeTable 1). It is possible to review this information over vari-ous windows of time, to attempt to gain some insight intothe transmission of disease. The generally accepted clini-cal definition of recent transmission in tuberculosis is pro-

gression to active disease after infection within 24months. Table 2 shows that an estimate of this (takenover the average of four shifting windows) is approxi-mately 47%, suggesting that approximately half of thecases result from infection prior to a two-year period,reflecting so-called reactivation disease from reactivateddormant bacilli [9]. However, it has previously beenshown that the use of IS6110 will only result in an overes-

Electrophoresis 1999, 20, 1807±1812 Molecular epidemiology of tuberculosis 1809

Table 1. Cluster of M.tuberculosis isolates (68 months)

Insertions Samples Clusters Samples/Cluster Uniques Clustered samples

1 10 1 9 1 92 28 1 27 1 273 42 5 2,6,10,12,12 0 424 104 1 102 2 1025 56 2 6,45 5 516 3 0 0 3 07 5 2 2,2 1 48 7 1 2 5 29 19 4 2,2,3,6 6 13

10 49 8 2,2,3,3,3,3,4,11 18 3111 81 15 2,2,2,2,2,2,2,2,3,3,4,4,6,8,11 26 5512 74 14 2,2,2,2,2,2,2,3,3,3,3,6,7,7 28 4613 70 15 2,2,2,2,2,2,2,2,3,3,3,4,5,6,11 19 5114 83 13 2,2,2,2,2,2,2,3,4,4,4,20,22 12 7115 48 11 2,2,2,2,2,2,2,2,2,3,3 24 2416 24 2 2,10 12 1217 14 5 2,2,3,3,3 1 1318 9 2 2,3 4 519 15 1 9 6 920 15 4 2,3,3,5 2 1321 15 3 2,4,5 4 1122 43 2 3,35 5 3823 45 3 2,2,36 5 4024 10 3 2,3,3 2 825 3 1 3 0 326 9 2 3,5 1 8

Total 881 121 193 688(697) (133) (189) (508)

Clustering: 56.7%Figures in parentheses exclude 1±4 insertions

Table 2. TB transmission estimates according to differ-ent time frames

Time (months)a) 6 12 24 42 68Clusteringb) (12±30)% 41% 47% 53% 57%Clustering (n)c) (30%) 56% 62% 70% 73%

a) Six-Month estimate based on sliding average.12-month estimate is average of 3 separate years.24-month estimate is an average of 4 windows, shiftedby 12 months (range 44.4±49.5%).

b) Estimate according to formula:No. of strains in clusters±No. of clusters (N(c))Total No. of strains (T(c))

c) (n) Estimate represents % of strains in clusters

Page 4: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

timate of recent transmission and that the use of otherprobes, such as MTB 484 [1] will reduce overall estima-tion of recent transmission by approximately 8±10%, viz.to 39% [7, 10]. Therefore, it is evident that the majority ofdisease in our high-incidence community may be definedas reactivation disease.

These results exclude the analysis of isolates having be-tween 1 and 4 insertion elements, since the discriminatorypower of the IS6110 element is regarded as inadequatefor genotyping of low insertion copy number strains [4±8,10]. An estimate of recent transmission including theseisolates is 64% over 68 months, or an average for all iso-lates over a 2-year window increases the estimate by 7%,viz. to 54% overall. However, using MTB 484 [1] as a sub-clustering probe (results not shown), an estimate ofrecent transmission in the 1±4 insert isolates drops byapproximately 18±36%. This is a similar figure to thatreported for the whole database excluding the 1±4 bandinserts. Thus, an estimate of recent transmission over a2-year period in this high incidence community is approxi-mately 36±44%, suggesting that approximately 60% ofthe disease is reactivation disease (i.e., the period foractive disease emergence after infection exceeds twoyears). This has major implications for disease control, asit implies that the higher this component of TB, the longerit will take for the disease to be eradicated, and it illus-trates the importance of controlling transmission.

The estimate for recent transmission (using the formulaT(c) ± N(c)/T(a), IS6110 typing, excluding 1±4 insert sam-ples) for 1994/5 is 49.5%; for 1995/6, 45.6%; and for 1996/7, 44.4%. One might also expect that as control becomes

more effective, overall incidence will decline and reactiva-tion disease will become more dominant. These figuressuggest such a trend for our community and suggest thatthe control strategy in force just prior to and during thisperiod has reduced transmission rates. The current esti-mate of clinical cure rates in the community was 57.6%overall in 1996, 61% overall in 1997, and 81% (1997) fornew sputum smear positive cases. However, quarterly var-iation in overall cure rates can vary from 36±72% (usuallyin the lower range for one community clinic and higher forthe other). For this trend and conclusion to be verified, aconsistant decline in clustering and increase in cure ratesshould be maintained over the next few years.

Over the period of our study, we detected at least onematch for 73% of the strains isolated. It is clear fromTable 2 that extending the period of study increases thelikelihood of a match, but this is asymptotic. It is unlikelythat a 100% match will ever be found, as reactivation ofinfections from prior to the study period, importation ofnew strains to the study area, evolutionary changes in thegenome, and laboratory error will impact the strain type.The results of this table would tend to support the clinicalpractice that a two-year window is useful to measure orestimate recent transmission.

The large number of different strains circulating in thecommunity was unexpected. It was hypothesized that anepidemic area might have relatively few strains in circula-tion. However, given that approximately 60% of the dis-ease is reactivation, this is arguably not unreasonable.However, from Table 1 it can also be seen that there aresome extremely large clusters representing a dispropor-

1810 R. Warren et al. Electrophoresis 1999, 20, 1807±1812

Figure 2. Age distribution of unique strains in the community.

Page 5: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

tionate number of tuberculosis cases within this commun-ity. This table shows that (excluding isolates with 1±4inserts) there are nine clusters of size 10 or more out of302 strain types (2.9% of genotypes), but these represent23% (201/881) of the database. This is strongly sugges-tive of two possibilities, viz., that there is rapid dissemina-tion of a successful strain (ªfitnessº), or alternatively, thesocial behavior of a group of hosts is conducive to therapid spread of disease. Furthermore, this table showstwo major bottlenecks in strain type ± note the very lownumber of isolates with 6 and 18 or 19 IS6110 inserts.The significance of this is not clear, but may indicateamplification of certain successful strains where thestrains with 6 or 18 insertion elements are of low fitness.

The information gathered may potentially be used toinvestigate bacterial population dynamics. Assuming thatthe estimated recovery rate for adult pulmonary cases is70%, and that over 68 months 73% of the strains match,it can be projected that approximately 7±10% of thepotential 100% of isolates will not be matched [from100%±73%±(30% ´ 0.73)±x = 0]. This estimate is likelythe sum of laboratory error, strain evolution and importa-tion of new strains from outside the community. It hasbeen suggested that the genome of the organism is rela-tively stable [11], and thus evolution or any change isunlikely to be a major component introducing new strainsover a short period (such as 5 years), leaving an estimateof up to 2% new strains imported to the community perannum. Figure 2 illustrates the occurrence of uniquestrains as a function of the age-stratified host population.This figure suggests that new strains (truly unique) aredetected at an average rate of 0.7% per annum. Thiscomponent includes reactivation cases. The balance (rep-resenting unique isolates, but isolates which clearlybelong to a family of strains) probably represents largelynew strains arising from evolutionary change, i.e., approx-imately 1% per annum.

Further information concerning the dynamics of theseevents is shown in Fig. 2. (Note that these unique strainsare regarded as unique by virtue of lacking features com-mon to other strains or strain families). If reactivation dis-ease is the main driving force of the epidemic, one mightexpect more unique strains in older persons and very fewin young people. The opposite is seen. The unique strainsmost likely represent importation of new strains and mostlikely reflect social contact outside the community. In thisregard, the under-representation of unique isolates in theage group 25±65 suggests that socializing within the com-munity in this age group is extensive, leading to rapidtransmission of disease. A complete explanation of theoccurrence of unique strains in infants or children willrequire extensive field research.

The number of isolates for clusters (of size ³ 3) wereexamined according to the age of the patients in eachcluster. This was done in four groups, such that ages 0±25 years, 26±35, 36±50, and 50 and older were grouped.Isolates were also counted into two age groups, viz.,patients aged 0±35 years and 36 and older, since Fig. 2suggests age 35 to be the appropriate medium age forour TB population. The summated frequency per agegroup is typical of that seen in all populations; however,examing the clusters individually, three different types offrequency distribution were observed: those where anapproximately even distribution between ages was noted,those where isolates were dominant in the younger agegroup, or those where isolates were dominant in the olderage group and often absent altogether in the younger agegroup.

Chi-square analysis shows a significant difference in thethree types (c2 37.86, 2 df p = 1 ´ 10±8). To confirm this,a similar analysis was done comparing two large clusters(n = 32 and n = 35) and a significant difference in distribu-tion was noted (c2 3.97, or 2.83, p = 0.046). This exami-nation of clusters stratified by the age of the host sug-gests that the current TB epidemic is in fact not a singleepidemic, but a series of epidemics. This can be illus-trated according to the model presented in Fig. 3, suchthat at any given stage, a certain strain may be rapidlydisseminating, others in full epidemic stage, and yet oth-ers waning. Some strains will represent the ªaverageº,viz., a Gaussian-type distribution, where p = 0.96 whencompared to the entire database.

Further evidence for strain differences is presented inTable 3, derived as follows: Cluster analysis of strainsshows that isolates may match others exactly, or be quiteclosely related (similarity > 65%; Fig. 1). Thus it waspossible to arrange many isolates into families and exam-ine these more closely (Fig. 1). A family-based clusterwas analyzed and an estimate of recent transmission wasmade for each family. This varied widely from family tofamily (29±72%), although the average (53%) is close to

Electrophoresis 1999, 20, 1807±1812 Molecular epidemiology of tuberculosis 1811

Figure 3. Model of TB epidemic. The overall epidemic ishypothesized to consist of individual clones appearingand disappearing as a function of time.

Page 6: DNA fingerprinting and molecular epidemiology of tuberculosis: Use and interpretation in an epidemic setting

that for the whole database. In this procedure, the defini-tion of unique strains differed from the previous exercise.In this case, unique strains could be genotypes withextensive similarity to others in the family, but differing byas little as one DNA fragment out of 23, but with the frag-ment occurring uniquely. These are likely to be mutational(or evolutionary) events. Clearly, the exclusion of trulyunique strains from the calculation (recent transmission =T(c) ± N(c)/T(c)) will increase the estimate of recent trans-mission. This result is suggestive of substantial differ-ences between strain families. Possibly, evolutionaryevents, such as IS6110 insertional or deletional activationor repression of individual genes, will substantially alterthe phenotype of a given bacteria and give it a determinedlevel of fitness. This fitness level, coupled with environ-mental opportunity, will determine opportunity for clonalexpansion, with and without further evolutionary events.

4 Concluding remarks

The use of DNA fingerprinting to study infectious diseasedynamics is a relatively new science (molecular epidemi-ology) and much needs to be learned. The extreme diver-sity of strains in a relatively small, high-incidence com-munity is remarkable. The complex biology of M.

tuberculosis is illustrated by the two components of thedisease, viz. transmission and reactivation, which can bequantitated using fingerprinting technology. The dynamicsof different isolates and strains can be examined andquantitated, suggesting substantial differences in biologi-cal parameters such as fitness. Tentative models may bemade concerning behavioral patterns of the hosts withinthe community. These models of host and bacterialdynamics should be useful in designing or modifying con-trol strategies [12] and help to fulfill the vision of thosewho doubt that antibiotics will be the ultimate ªmagic bul-letº to conquer tuberculosis.

Glaxo Wellcome Action TB Initiative and NIH grant R01-A135265-03 are acknowledged for financial support; EUcontract Biomed 1-BMH1-CT93-1614 for computer-assis-tance; Dr. I. Toms (City Health Department) for coopera-tion with TB clinics and cure rate information; and Ms. S.Carlini for assistance with cultures.

Received February 25, 1999

5 References

[1] Dubos, R., in: Mirage of Health: Utopias, Progress and Bio-logical Change, Anchor Books, Gordon City, NY 1961.

[2] Garrett, L., in: The Coming Plague. Newly Emerging Dis-eases in a World out of Balance, Penguin Books, New York1995.

[3] Stead, W. W., Clin. Chest Med. 1997, 18, 65±77.[4] Alland, D., Kalkut, G. E., Moss, A. R., McAdam, R. A., Hahn,

J. A., Bosworth, W., Drucker, E., Bloom, B., New Engl.J. Med. 1994, 330, 1710±1716.

[5] Small, P. M., Hopewell, P. C., Singh, S. P., Paz, A., Parson-net, J., Ruston, D. C., Schecter, G. F., Daley, C. L., School-nik, G. K., New Engl. J. Med. 1994, 330, 1703±1708.

[6] Van Embden, J. D. A., Cave, M. D., Crawford, J. T., Dale,J. W., Eisenod, K. D., Giquel, B., Hermans, P., Matim, C.,McAdam, R., Shinnick, T. M., J. Clin. Microbiol. 1993, 31,406±409.

[7] Warren, R., Richardson, M., Sampson, S., Hauman, J. H.,Beyers, N., Donald, P. R., van Helden, P. D., J. Clin. Micro-biol. 1996, 34, 2219±2224.

[8] Warren, R., Hauman, J., Beyers, N., Richardson, M.,Schaaf, H. S., Donald, P., van Helden, P., S. Afr. Med. J.1996, 86, 45±49.

[9] Wayne, L. G., Eur. J. Clin. Microbiol. Infect. Dis. 1994, 13,908±914.

[10] Burman, W. J., Reves, R. R., Hawkes, A. P., Rietmeyer,C. A., Yang, Z., el-Hajj, H., Bates, J. N., Cave, M. D., Am.J. Respir. Crit. Care Med. 1997, 155, 1140±1146.

[11] Sreevatsan, S., Pan, X., Stockbauer, K. E., Cornell, N. D.,Kreiswirth, B. N., Whittam, T. S., Musser, J. M., Proc. Natl.Acad. Sci. USA 1997, 94, 9869±9874.

[12] Blower, S. M., Small, P. M., Hopewell, P. C., Science 1996,273, 497±500.

[13] Salamon, H., Segal, M. R., Ponce-de-Leon, A., Small, P. M.,Emerg. Infect. Dis. 1998, 4, 159±168.

[14] Behr, M. A., Hopewell, P. C., Paz, E. A., Kawamura, L. M.,Schecter, G. F., Small, P. M., Am. J. Respir. Crit. Care Med.1998, 158, 465±469.

1812 R. Warren et al. Electrophoresis 1999, 20, 1807±1812

Table 3. Cluster analysis of M. tuberculosis families

Family No. of No. of No. in No. of % RTa)

isolates clusters clusters uniques

(1) 8 1 5 3 50(2) 109 9 87 22 72(3) 10 1 8 2 70(4) 6 1 3 3 33(5) 36 8 27 9 53(6) 21 4 10 11 29(7) 154 27 111 43 55(8) 9 2 5 4 34(9) 10 3 8 2 50

(10) 24 5 17 7 50(11) 12 3 8 4 42(12) 23 5 12 11 30(13) 8 3 6 2 38(14) 7 2 6 1 57(15) 12 2 9 3 58(16) 72 19 51 21 44(17) 9 3 6 3 33(18) 17 4 14 3 59(19) 18 3 13 5 56(20) 7 1 3 4 29(21) 3 1 2 1 33(22) 14 2 6 8 29(23) 9 1 7 2 67(24) 9 2 8 1 67(25) 2 1 2 0 50

a) %RT calculated according to T(c)±N(c)/T(a)