articles DNA sequence of both chromosomes of the cholera...
Transcript of articles DNA sequence of both chromosomes of the cholera...
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 477
articles
DNA sequence of both chromosomes ofthe cholera pathogen Vibrio choleraeJohn F. Heidelberg*, Jonathan A. Eisen*, William C. Nelson*, Rebecca A. Clayton, Michelle L. Gwinn*, Robert J. Dodson*, Daniel H. Haft*,Erin K. Hickey*, Jeremy D. Peterson*, Lowell Umayam*, Steven R. Gill*, Karen E. Nelson*, Timothy D. Read*, Herve Tettelin*,Delwood Richardson*, Maria D. Ermolaeva*, Jessica Vamathevan*, Steven Bass*, Haiying Qin*, Ioana Dragoi*, Patrick Sellers*,Lisa McDonald*, Teresa Utterback*, Robert D. Fleishmann*, William C. Nierman*, Owen White *, Steven L. Salzberg*,Hamilton O. Smith*², Rita R. Colwell³, John J. Mekalanos§, J. Craig Venter*² & Claire M. Fraser*
* The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA³ Center of Marine Biotechnology, University of Maryland Biotechnology Institute, 701 East Pratt Street, Baltimore, Maryland 21202, USA, and Department of Cell and
Molecular Biology, University of Maryland, College Park, Maryland 20742, USA
§ Harvard Medical School, Department of Microbiology and Molecular Genetics, 200 Longwood Avenue, Boston, Massachusetts 02115, USA
............................................................................................................................................................................................................................................................................
Here we determine the complete genomic sequence of the Gram negative, g-Proteobacterium Vibrio cholerae El Tor N16961 to be4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that togetherencode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication,transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) arelocated on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genescompared with the large chromosome (42%), and also contains many more genes that appear to have origins other than theg-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host `addiction' genes thatare typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by anancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living,environmental organism emerged to become a signi®cant human bacterial pathogen.
Vibrio cholerae is the aetiological agent of cholera, a severe diar-rhoeal disease that occurs most frequently in epidemic form1.Cholera has been epidemic in southern Asia for at least 1,000years, but also spread worldwide to cause seven pandemics since1817 (ref. 1). When untreated, cholera is a disease of extraordinarilyrapid onset and potentially high lethality. Although clinical man-agement of cholera has advanced over the past 40 years, choleraremains a serious threat in developing countries where sanitation ispoor, health care limited, and drinking water unsafe.
Vibrio cholerae as a species includes both pathogenic andnonpathogenic strains that vary in their virulence gene content2.This bacterium contains a wide variety of strains and biotypes,receiving and transferring genes for toxins3, colonization factors4,5,antibiotic resistance6, capsular polysaccharides that provide resis-tance to chlorine7 and new surface antigens, such as the 0139lipopolysaccharide and O antigen capsule8,9. The lateral or hori-zontal transfer of these virulence genes by phage3, pathogenicityislands10,11 and other accessory genetic elements12 provides insightsinto how bacterial pathogens emerge and evolve to become newstrains.
Vibrio species represent a signi®cant portion of the culturableheterotrophic bacteria of oceans, coastal waters and estuaries13,14.Environmental studies show that these bacteria strongly in¯uencenutrient cycling in the marine environment. Various species of thisgenus are also devastating pathogens for ®n®sh, shell®sh and mam-mals. There is still much to be learned about the aquatic ecology andnatural history of V. cholerae including its autochthonous (native)existence in endemic locales during cholera-free, interepidemicperiods, which environmental factors, such as climate13,15, aidedits re-emergence in Latin America, and which environmental factorsare associated with its habitat in cholera endemic regions. Forexample, V. cholerae, during interepidemic periods, is an inhabitant
of brackish and estuarine waters, and in these environments isassociated with zooplankton and other aquatic ¯ora and fauna16.The organism also enters a `̀ viable but nonculturable''17 state undercertain conditions. Roles for these environmental interactions andthis dormant physiological state in the emergence and persistence ofpathogenic V. cholerae have been proposed14,17.
Here we report the determination and analysis of the Vibriocholerae genome sequence. This analysis represents an importantstep toward the complete molecular description of how this free-living environmental organism emerged to become a humanpathogen by horizontal gene transfer.
Genome analysisThe genome of V. cholerae was sequenced by the whole genomerandom sequencing method18. The genome consists of two circularchromosomes19,20 of 2,961,146 (chromosome 1) and 1,072,314
² Present address: Celera Genomics, 45 West Gude Drive, Rockville, Maryland 20850, USA
Table 1 General features of the Vibrio cholerae genome
Chromosome 1 Chromosome 2
Size (bp) 2,961,151 1,072,914Total number of sequences 36,797 14,367G+C percentage 47.7 46.9Total number of ORFs 2,770 1,115ORF size (bp) 952 918Percentage coding 88.6 86.3Number of rRNA operons (16S-23S-5S) 8 0Number of tRNA 94 4Number similar to known proteins 1,614 (58%) 465 (42%)Number similar to proteins of unknown function* 163 (6%) 66 (6%)Number of conserved hypothetical proteins² 478 (17%) 165 (15%)Number of hypothetical proteins³ 515 (19%) 419 (38%)Number of Rho-independent terminators 599 193.............................................................................................................................................................................
bp, base pairs. ORFs, open reading frames.* Proteins of unknown function, signi®cant sequence similarity (homology) to a named protein forwhich there is currently no known function.² Conserved hypothetical protein, sequence similarity to a translation of another ORF, but there iscurrently no experimental evidence a protein is expressed.³ Hypothetical protein, no signi®cant sequence similarity to another protein.
© 2000 Macmillan Magazines Ltd
(chromosome 2) base pairs, with an average G+C content of 46.9%and 47.7%, respectively. There are a total of 3,885 predicted openreading frames (ORFs) and 792 predicted Rho-independent termi-nators; with 2,770 and 1,115 ORFs and 599 and 193 Rho-indepen-dent terminators on the individual chromosomes (Table 1, Figs 1and 2). Most genes required for growth and viability are located onchromosome 1, although some genes found only on chromosome 2are also thought to be essential for normal cell function (forexample, dsdA, thrS and the genes encoding ribosomal proteinsL20 and L35). Additionally, many intermediaries of metabolicpathways are encoded only on chromosome 2 (Fig. 3).
The replicative origin in chromosome 1 was identi®ed by simi-larity to the Vibrio harveyi and Escherichia coli origins, co-localiza-tion of genes (dnaA, dnaN, recF and gyrA) often found near theorigin in prokaryotic genomes, and GC nucleotide skew (G-C/G+C) analysis21. Based on these, we designated base-pair 1 in anintergenic region that is located in the putative origin of replication.Only the GC skew analysis was useful in identifying a putative originon chromosome 2.
This genomic sequence of V. cholerae con®rmed the presence of alarge integron island (a gene capture system) located on chromo-some 2 (125.3 kbp)22,23. The V. cholerae integron island contains allcopies of the V. cholerae repeat (VCR) sequence and 216 ORFs (Fig.1). However, most of these ORFs have no homology to othersequences. Among the recognizable integron island genes arethree that encode gene products that may be involved in drugresistance (chloramphenicol acetyltransferase, fosfomycin resis-tance protein and glutathione transferase), several DNA metabo-lism enzymes (MutT, transposase, and an integrase), potentialvirulence genes (haemagglutinin and lipoproteins) and threegenes which encode gene products similar to the `host addiction'proteins (higA, higB and doc), which are used by plasmids to selectfor their maintenance by host cells.
Comparative genomicsThe two-chromosome structure of V. cholerae allows for compar-isons, both between the two chromosomes of this organism andbetween either of the V. cholerae chromosomes and the chromo-somes of other microbial species. There is pronounced asymmetryin the distribution of genes known to be essential for growth andvirulence between the two chromosomes. Signi®cantly more genesencoding DNA replication and repair, transcription, translation,cell-wall biosynthesis and a variety of central catabolic and biosyn-thetic pathways are encoded by chromosome 1. Similarly, mostgenes known to be essential in bacterial pathogenicity (that is, thoseencoding the toxin co-regulated pilus, cholera toxin, lipopolysac-charide and the extracellular protein secretion machinery) are alsolocated on chromosome 1. In contrast, chromosome 2 contains alarger fraction (59%) of hypothetical genes and genes of unknownfunction, compared with chromosome 1 (42%) (Fig. 4). Thispartitioning of hypothetical proteins on chromosome 2 is highlylocalized in the integron island (Fig. 2). Chromosome 2 also carriesthe 3-hydroxy-3-methylglutaryl CoA reductase, a gene apparentlyacquired from an archaea (Y. Boucher and W. F. Doolittle, personalcommunication).
The majority of the V. cholerae genes were very similar to E. coligenes (1,454 ORFs), but 499 (12.8%) of the V. cholerae ORFsshowed highest similarity to other V. cholerae genes, suggestingrecent duplications (Figs 5 and 6). Most of the duplicated ORFsencode products involved in regulatory functions (59), chemotaxis(50), transport and binding (42), transposition (18), pathogenicity(13) or unknown functions encoded by conserved hypothetical (62)and hypothetical proteins (113). There are 105 duplications with atleast one of each ORF on each chromosome indicating there havebeen recent crossovers between chromosomes. The extensive dupli-cation of genes involved in scavenging behaviour (chemotaxis andsolute transport) suggests the importance of these gene products in
V. cholerae biology, notably its ability to inhabit diverse environ-ments. These environments, in turn, may have selected the duplica-tion and divergence of genes useful for specialized functions.Additionally, whereas El Tor strain N16961 carries only a singlecopy of the cholera toxin prophage, other V. cholerae strains carryseveral copies of this element24,25, and strains of the classical biotypehave a second copy of the prophage that is localized on chromosome2 (ref. 20). Thus, virulence genes are presumed to be subject toselective pressure, affecting copy number and chromosomal loca-tion.
Several ORFs with apparently identical functions exist on bothchromosomes which were probably acquired by lateral gene trans-fer. For example, glyA (encoding serine hydroxymethyl transferase)is found once on each chromosome but the phylogenetic analysissuggest the glyA copy on chromosome 1 branches with thea-Proteobacteria, whereas the copy on chromosome 2 brancheswith the g-Proteobacteria (see Supplementary Information). Thechromosome 2 glyA is ¯anked by genes encoding transposases,suggesting that this gene was acquired through a transpositionevent.
articles
478 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com
Figure 1 Linear representation of the V. cholerae chromosomes. The location of
the predicted coding regions, colour-coded by biological role, RNA genes, tRNAs, other
RNAs, Rho-independent terminators and Vibrio cholerae repeats (VCRs) are indicated.
Arrows represent the direction of transcription for each predicted coding region. Numbers
next to the tRNAs represent the number of tRNAs at a locus. Numbers next to GES
represent the number of membrane-spanning domains predicted by the Goldman,
Engleman and Steitz scale calculated by TopPred for that protein. Gene names are
available at the TIGR web site (www.tigr.org) and as Supplementary Information.
1100,000
200,000
300,000
400,000
500,000600,000
700,000
800,000
900,000
1,000,000
1 100,000200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
1,100,000
1,200,0001,300,000
1,400,0001,500,0001,600,0001,700,000
1,800,000
1,900,000
2,000,000
2,100,000
2,200,000
2,300,000
2,400,000
2,500,000
2,600,000
2,700,0002,800,000
2,900,000
Figure 2 Circular representation of the V. cholerae genome. The two chromosomes, large
and small, are depicted. From the outside inward: the ®rst and second circles show
predicted protein-coding regions on the plus and minus strand, by role, according to the
colour code in Fig. 1 (unknown and hypothetical proteins are in black). The third circle
shows recently duplicated genes on the same chromosome (black) and on different
chromosomes (green). The fourth circle shows transposon-related (black), phage-related
(blue), VCRs (pink) and pathogenesis genes (red). The ®fth circle shows regions with
signi®cant x2 values for trinucleotide composition in a 2,000-bp window. The sixth circle
shows percentage G+C in relation to mean G+C for the chromosome.The seventh and
eighth circles are tRNAs and rRNAs, respectively.
Q
© 2000 Macmillan Magazines Ltd
Origin and function of the small chromosome of V. choleraeSeveral lines of evidence suggest that chromosome 2 was originally amegaplasmid captured by an ancestral Vibrio species. The phyloge-netic analysis of the ParA homologues located near the putativeorigin of replication of each chromosome shows chromosome 1ParA tending to group with other chromosomal ParAs, and theParA from chromosome 2 tending to group with plasmid, phageand megaplasmid ParAs (see Supplementary Information). Ingeneral, genes on chromosome 2, with an apparently identicalfunctioning copy on chromosome 1, appear less similar to ortho-logues present in other g-Proteobacteria species (see Supplemen-tary Information). Also, chromosome 1 contains all the ribosomalRNA operons and at least one copy of all the transfer RNAs (fourtRNAs are found on chromosome 2, but there are duplicates onchromosome 1). In addition, chromosome 2 carries the integronregion, an element often found on plasmids26. Finally, the bias in thefunctional gene content is more easily explained, if chromosome 2
was originally a megaplasmid (Fig. 4). The megaplasmid presum-ably acquired genes from diverse bacterial species before its captureby the ancestral Vibrio. The relocation of several essential genes fromchromosome 1 to the megaplasmid completed the stable capture ofthis smaller replicon. Apparently this capture of the megaplasmidoccurred long enough ago that the trinucleotide composition andpercentage G+C content between the two chromosomes hasbecome similar (except for laterally moving elements such as theintegron island, bacteriophage genomes, transposons, and so on).The two chromosome structure is found in other Vibrio species19
suggesting that the gene content of the megaplasmid continues toprovide Vibrio with an evolutionary advantage, perhaps within theaquatic ecosystem where Vibrio species are frequently the dominantmicroorganisms14,16.
It is unclear why chromosome 2 has not been integrated intochromosome 1. Perhaps chromosome 2 plays an important special-ized function that provides the evolutionary selective pressure to
articles
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 479
PP
PEP
Pyruvate
Acetyl-CoAACETATE acetyl-P
citrate
cis-aconitate
isocitrate
2-oxoglutarate
succinyl-CoA
succinate
fumarate
malate
oxaloaceticacid
glyoxylate
L-GLUTAMATE
GLUCOSEGLYCOLYSIS
GLUCONEOGENESIS
ENTNERDOUDOROFFPATHWAY
NON-OXIDATIVEPENTOSE
PHOSPHATEPATHWAY
Fatty Acid Biosynthesisand Degradation
L-LACTATE
ethanolL- leucine
formate, H2+CO2
PROLINE
serine
ORNITHINEARGININE
ASPARTATE
t
LACTOSE
GLYCOGEN
propionate
L-ALANINESERINEL-TRYPTOPHAN
2-keto-isovalerate
TCA Cycle
ASPARAGINE
ATP ADP+ Pi
ATPsynthase
H+
sugar-Pi
IIBC
IIA
Pi
Pi
Pi
PTS system
ptsI
ptsH
iron (III)(2,2)
vibriobactinfepB/C/D/G
zincznuA/B/C
heminhutB/C/D
uraciluraA
xanthine/uracil family
NMN,pnuC
G3P
ugpA/B/C/EhemeccmA/B/C/D
drugs(14,3)
toxins(4,2)
lipolysaccharide/
O-antigen, rfbH/I
CheA/B/D?/R/
V/W/Y/Z
MCPs (23,20)
>40 flagellar andmotor genes
chemotacticsignals
, potE
e-
ELECTRON TRANSPORT CHAIN
H+ H+ H+
C4- dicarboxylatedctP/Q/M, dcuA/B/C
benzoate, benE
formate ? (1,1)
Na+/dicarboxylate(1,1)
Na+/citrate, citS
oxalate/formate ?
galactosidemglA/B/C
maltosemalE/F/G/K
riboserbsA/B/C/D
L-lactate ?
gluconate ?
sulphate, cysZ
phosphate, nptA ?
sulphate(2,2)
sulphatecysA/P/T/W
phosphate (1,1)pstA/B/C/S
molybdenummodA/B/C
Pho4 family protein
sugar family
AcrB/D/Ffamily (2,1)
cationsE1-E2family (3)
vibriobactinreceptorviuA
potassiumkefB (2?)
Na+/H+(4,2)
Mg2+, mgt,(2,1)
NH4+ ?
potassium, trkA/H
tyrosine, tyrP
tryptophan,mtr
AzlC family
BCCT family
amino acids (3,1)
peptides (3,1)
spermidine/putrescinepotA/B/C/D
oligopeptidesoppA/B/C/D/F
arginineartI/M/P/Q
serine,sdaC (2)
bcaa, brnQ
iron (II), feoA/B
potassium, kup
TonB systemreceptor(1,2)
ExbB(1,1) ExbD(1,1)
TonB(1,1)
porin ?
NupCFamily(2,1)
FLAGELLUM
NadCfamily
arginine/ornithineputrescine/ornithine
cadaverine/lysine?
proton/peptide family
Na+/alanine (3)
proton/glutamate, gltP(1,1)
Na+/glutamate,gltS
Na+/proline,putP
fatty acids, fadL(2,1)
MCP
NAG
sucrose?
trehalose?
cellobiose*
fructose
glucose
mannitol
maltoporinompS
PUTRESCINE
D-alanineL-alanineleucinevaline
GLUTAMINE
irgAhemehutAK+?
mscL
Glyoxylatebypass
MALATE
GALACTOSE
GLYCEROL**
MANNITOLMANNOSE
D-LACTATE
FRUCTOSE
SUCROSE
TREHALOSEN-ACETYLGLUCOSAMINE
CHITINSTARCH
melanin
L- phenylalanineL- tryptophantyrosine
GLUCONATE
PRPP
chorismate
RIBOSE
thiamine?colicinstolA/B/Q/R
glycglpF
G3PglpT
B12?btuB/C/D
MsbA?
sbp
Histidine DegradationPathway
HISTIDINE
urocanate
imidazolepropanoate
formiminoL-glutamate
L-glutamate
fumarate
METHIONINE
L-iso-leucine
propionate
O-succinyl-
homoserineglycine CO2
SERINE
THREONINE
L- cysteine
L-ASPARTATE
diaminopimelic acid
L-lysinecadaverine
Figure 3 Overview of metabolism and transport in V. cholerae. Pathways for energy
production and the metabolism of organic compounds, acids and aldehydes are shown.
Transporters are grouped by substrate speci®city: cations (green), anions (red),
carbohydrates (yellow), nucleosides, purines and pyrimidines (purple), amino acids/
peptides/amines (dark blue) and other (light blue). Question marks associated with
transporters indicate a putative gene, uncertainty in substrate speci®city, or direction of
transport. Permeases are represented as ovals; ABC transporters are shown as composite
®gures of ovals, diamonds and circles; porins are represented as three ovals; the large-
conductance mechanosensitive channel is shown as a gated cylinder; other cylinders
represent outer membrane transporters or receptors; and all other transporters are drawn
as rectangles. Export or import of solutes is designated by the direction of the arrow
through the transporter. If a precise substrate could not be determined for a transporter,
no gene name was assigned and a more general common name re¯ecting the type of
substrate being transported was used. Gene location on the two chromosomes, for both
transporters and metabolic steps, is indicated by arrow colour: all genes located on the
large chromosome (black); all genes located on the small chromosome (blue); all genes
needed for the complete pathway on one chromosome, but a duplicate copy of one or
more genes on the other chromosome (purple); required genes on both chromosomes
(red); complete pathway on both chromosomes (green). (Complete pathways, except for
glycerol, are found on the large chromosome.) Gene numbers on the two chromosomes
are in parentheses and follow the colour scheme for gene location. Substrates underlined
and capitalized can be used as energy sources. PRPP, phosphoribosyl-pyrophosphate;
PEP, phosphoenolpyruvate; PTS, phosphoenolpyruvate-dependant phosphotransferase
system; ATP, adenosine triphosphate; ADP, adenosine diphosphate; MCP, methyl-
accepting chaemotaxis protein; NAG, N-acetylglucosamine; G3P, glycerol-3-phosphate;
glyc, glycerol; NMN, nicotinamide mononucleotide. Asterisk, because V. cholerae does
not use cellobiose, we expect this PTS system to be involved in chitobiose transport.
© 2000 Macmillan Magazines Ltd
suppress integration events when they do occur. For example, ifunder some environmental condition there is a difference in copynumber between the chromosomes, then chromosome 2 may haveaccumulated genes that are better expressed at higher or lower copynumber than genes on chromosome 1. A second possibility is that,in response to environmental cues, one chromosome may partitionto daughter cells in the absence of the other chromosome (aberrantsegregation). Such single-chromosome-containing cells would bereplication-defective but still maintain metabolic activity (`drone'cells), and, therefore, be a potential source of `̀ viable, but non-culturable (VBNC)'' cells observed to occur in V. cholerae17. Such`drone' cells may also play a role in V. cholerae bio®lms7,27,28 by, forexample, producing extracellular chitinase, protease and otherdegradative enzymes that enhance survival of cells in a bio®lm,retaining two chromosomes without directly competing with thesecells for nutrients.
Transport and energy metabolismVibrio cholerae has a diverse natural habitat that includes associationwith zooplankton in a sessile stage, a planktonic state in the watercolumn, and the capacity to act as a pathogen within the humangastrointestinal tract. It is, therefore, no surprise that this organismmaintains a large repertoire of transport proteins with broadsubstrate speci®city and the corresponding catabolic pathways toenable it to respond ef®ciently to these different and constantlychanging ecosystems (Fig. 4). Many of the sugar transporter systemsand their corresponding catabolic pathway enzymes are localized ona single chromosome (that is, ribose and lactate transport anddegradation enzymes are contained on chromosome 2, whereas thetrehalose systems reside on chromosome 1). However, many of theother energy metabolism pathways are split (that is, chitin, glyco-lysis, and so on) between the chromosomes (Fig. 3).
In aquatic environments, chitin often represents a source of bothcarbon and nitrogen. This energy source is important for V. choleraeas it is associated with zooplankton, which have a chitinousexoskeleton13,15,29. Vibrio cholerae degrades chitin by a pathwaythat is very similar to that of Vibrio furnissii30. Sequence analysissuggests a phosphenolpyruvate phosphotransferase system (PTS)
for cellobiose transport, but as V. cholerae does not use cellobiose, it ismore likely that this PTS is involved in transport of the structurallysimilar compound, chitobiose, analogous to the situation proposedfor Bourrelia burgdorferi18.
The three anions that are transported by ABC transport systems inV. cholerae are molybdenum, phosphate and sulphate. Molybdenumtransport genes (modA/B/C) are all located on chromosome 2, andmost of the sulphate transport genes are on the large molecule.However, copies of the genes for phosphate transporters are foundin both chromosomes. The genes in these two phosphate transportoperons are different from each other and do not represent a recentduplication; instead, this suggests that one may be an acquired operon.
Interchromosomal regulationSeveral of the regulatory pathways, both for regulation in responseto environmental and pathogenic signals, are divided between thetwo chromosomes. These included pathways for starvation survival,`quorum sensing' and expression of the entertoxigenic haemolysin,HlyA.
During periods of nutrient starvation, V. cholerae, and otherGram-negative bacteria, enter the stationary phase and, later, theviable but nonculturable (VBNC) state14,17. The alternative sigmafactor j38 (rpoS) is required for survival of V. cholerae in theenvironment but not for pathogenicity31, and therefore probablyplays an important role in the initiation of the VBNC state. There isone copy of rpoS, located on chromosome 1, near the oriC. The RpoSregulates expression of several other proteins, including catalase,cyclopropane-fatty-acyl-phospholipid synthase and HA/protease,which are found on both chromosomes31.
Genes involved in `quorum sensing', or cell-density-dependentregulation, also exist on both chromosomes of V. cholerae. Inbioluminescent Vibrio species (notably Vibrio ®scheri and V. har-veyi), quorum sensing is used to control light production. Althoughthis strain of V. cholerae lacks the genes for bioluminescence, it doeshave the genes required for the autoinducer-2 (AI-2) quorum-sensing mechanism32 (luxOPQSU) but this pathway is split betweenthe chromosomes with luxOSU on chromosome 1 and luxPQ onchromosome 2. Similarly, another transcriptional regulatory gene,
articles
480 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com
0
1
2
3
4
5
6
7
8
9
10
Amino
acid b
iosyn
thes
is*
Purine
s, pyr
imidine
s,
nucle
osides
, and
nuc
leotid
es
Fatty
acid an
d pho
spho
lipid m
etab
olism
Biosyn
thes
is of
cofa
ctor
s,
prosth
etic
grou
ps, an
d carri
ers*
Centra
l inte
rmed
iary m
etab
olism
*
Energ
y met
aboli
sm
Trans
port b
inding
and p
rote
ins
DNA met
aboli
sm*
Trans
criptio
n*
Prote
in sy
nthe
sis*
Prote
in fa
te*
Regula
tory
func
tions
Cell en
velop
e*
Cellula
r pro
cess
es*
Other
cate
gorie
s
Hypot
hetic
al*1
Op
en r
ead
ing
fram
es (%
)
Figure 4 Percentage of total Vibrio cholerae open reading frames (ORFs) in biological
roles compared with other g-Proteobacteria. These were V. cholerae, chromosome 1
(blue); V. cholerae, chromosome 2 (red); Escherichia coli (yellow); Haemophilus in¯uenzae
(pale blue). Signi®cant partitioning (P , 0.01) of biological roles between V. cholerae
chromosomes is indicated with an asterisk, as determined with a x2 analysis.
1, Hypothetical contains both conserved hypothetical proteins and hypothetical proteins,
and is at 1/10 scale compared with other roles.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Aquifex
aeoli
cus
Deinoc
occu
s rad
iodur
ans
Bacillu
s sub
tilis
Myc
oplas
ma g
enita
lium
Myc
obac
teriu
m tu
bercu
losis
Borre
lia b
urgd
orfe
ri
Trepon
ema p
allidum
Chlam
ydia
pneum
oniae
Chlam
ydia
trach
omat
is
Esche
richia
coli
Haem
ophil
us in
fluen
zae
Vibrio ch
olera
e
Neisse
ria m
ening
itidis
Ricket
tsia p
rowaz
ekii
Helico
bacte
r pylo
ri
Campylo
bacte
r jeju
ni
Synec
hocy
stis s
p.
Ther
mot
oga m
aritim
a
Aerop
yrum
per
nix
Archa
eoglo
bus fu
lgidus
Met
hano
cocc
us ja
nnas
chii
Met
hano
bacte
rium
ther
moa
utot
rophic
um
Pyroco
ccus
hor
ikosh
ii
Pyroco
ccus
abys
si
Caeno
rhab
ditis el
egan
s
Sacch
arom
yces
cere
visiae
Rel
ativ
e p
rop
ortio
n of
mos
t si
mila
rop
en r
ead
ing
fram
es
Figure 5 Comparison of the V. cholerae ORFs with those of other completely sequenced
genomes. The sequence of all proteins from each completed genome were retrieved from
NCBI, TIGR and the Caenorhabditis elegans (wormpep16) databases. All V. cholerae ORFs
(large chromosome, blue; small chromosome, red) were searched against all
other genomes with FASTA3. The number of V. cholerae ORFs with greatest similarity
(E # 10-5) are shown in proportion to the total number of ORFs in that genome. There
were no ORFs that were most similar to a Mycoplasma pneumoniae ORF.
© 2000 Macmillan Magazines Ltd
hlyU (ref. 33), is located on chromosome 1, while the gene itregulates, hlyA, is located on chromosome 2.
DNA repairVibrio cholerae has genes encoding several DNA-repair and DNA-damage-response pathways, including nucleotide-excision repair,mismatch-excision repair, base-excision repair, AP endonuclease,alkylation transfer, photoreactivation, DNA ligation, and all themajor components of recombination and recombinational repair,including initiation, recombination and resolution34. In addition,homologues of many of the genes involved in the SOS response inE. coli are found. The presence of three photolyase homologues,more than have been found in other bacterial species, probablyallows for the ability to photoreactivate the two major forms ofultraviolet-induced DNA damage (cyclobutane pyrimidine dimersand 6-4 photoproducts), and may also allow use of a range ofwavelengths of light used for the energy required for photoreactiva-tion. It is also of interest that many of the repair genes are onchromosome 2 (alkA, ada1, ada2, phr3, mutK, sbcCD, dcm, mutT3),indicating that this chromosome is probably required for full DNArepair capability.
PathogenicityToxins. The genome sequence of V. cholerae El Tor N16961 revealeda single copy of the cholera toxin (CT) genes, ctxAB, located onchromosome 1 within the integrated genome of CTXf, a temperate®lamentous phage3. The receptor for entry of CTXf into the cell isthe toxin-coregulated pilus (TCP)3, and the TCP gene cluster (seebelow) also resides on chromosome 1. Like the structural genes forCT and TCP, the regulatory gene, toxR, which controls theirexpression in vivo35, is also located on chromosome 1.
On the other side of CTXf prophage is a region encoding an RTXtoxin (rtxA), and its activator (rtxC) and transporters (rtxBD)36. Athird transporter gene has been identi®ed that is a paralogue of rtxB,and is transcribed in the same direction as rtxBD. Downstream ofthis gene are two genes encoding a sensor histidine kinase andresponse regulator. Trinucleotide composition analysis suggests thatthe RTX region was horizontally acquired along with the sensorhistidine kinase/response regulator, suggesting these regulatorseffect expression of the closely linked RTX transcriptional units.
Also present are genes encoding numerous potential toxins,including several haemolysins, proteases and lipases. These includehap, the haemagluttinin protease, a secreted metalloprotease thatseems to attack proteins involved in maintaining the integrity ofepithelial cell tight junctions37, and hlyA, encoding a secretedhaemolysin that displays enterotoxic activity38. In contrast toCTX, RTX and all known intestinal colonization factors, the hapand hlyA genes virulence factors reside on chromosome 2.
Vibrio cholerae has been reported to produce shiga-like toxins39;however, the sequence did not reveal genes encoding speci®chomologues of the A or B subunits of shiga toxin. Also not detectedwere genes encoding homologues of E. coli heat stable toxin (ST),which have been detected in other pathogenic strains of V.cholerae40.Colonization factors. The critical intestinal colonization factorof V. cholerae is the TCP, a type IV pilus5,41. The genomesequence con®rmed that the genes involved in TCP assembly(tcpABCDEFGHIJNQRST) reside on chromosome 1 (ref. 20) as partof a proposed `pathogenicity island' (also referred to as VPI)composed of recently acquired DNA that encodes not only TCP,but also other genes associated with the ToxR regulatory cascade,such as acfABCD, toxT, aldA and tagAB10,11. Trinucleotide composi-tion analysis suggest that this 45.3-kb segment begins at a 20-bp siteupstream of aldA, and encompasses a helicase-related protein and atranscriptional activator which both share homology with bacter-iophage proteins. At the other end of the segment of atypicaltrinucleotide composition is a phage family integrase and the
articles
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 481
V.cholerae VC0512 V.cholerae VCA1034
V.cholerae VCA0974 V.cholerae VCA0068
V.cholerae VC0825 V.cholerae VC0282
V.cholerae VCA0906 V.cholerae VCA0979
V.cholerae VCA1056 V.cholerae VC1643
V.cholerae VC2161 V.cholerae VCA0923
V.cholerae VC0514
V.cholerae VCA0773V.cholerae VC1868
V.cholerae VC1313 V.cholerae VC1859
V.cholerae VC1413 V.cholerae VCA0268
V.cholerae VCA0658 V.cholerae VC1405V.cholerae VC1298
V.cholerae VC1248V.cholerae VCA0864 V.cholerae VCA0176
V.cholerae VCA0220 V.cholerae VC1289
V.cholerae A1069V.cholerae VC2439
V.cholerae 1967V.cholerae A0031
V.cholerae VC1898 V.cholerae VCA0663
V.cholerae VCA0988V.cholerae VC0216 V.cholerae VC0449
V.cholerae VCA0008V.cholerae VC1406
V.cholerae VC1535V.cholerae VC0840
V.cholerae VC0098
E.coli gi1787690
E.coli gi1789453
C.jejuni Cj0144
C.jejuni Cj0262c
B.subtilis gi2633766Synechocystis sp. gi1001299
Synechocystis sp. gi1001300Synechocystis sp. gi1652276
Synechocystis sp. gi1652103H.pylori gi2313716H.pylori 99 gi4155097
C.jejuni Cj1190cC.jejuni Cj1110c
A.fulgidus gi2649560A.fulgidus gi2649548
B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609
B.subtilis gi2635610 B.subtilis gi2635882
E.coli gi1788195E.coli gi2367378E.coli gi1788194
V.cholerae VCA1092
H.pylori gi2313186H.pylori 99 gi4154603
C.jejuni Cj1564
C.jejuni Cj1506cH.pylori gi2313163H.pylori 99 gi4154575
H.pylori gi2313179H.pylori 99 gi4154599
C.jejuni Cj0019cC.jejuni Cj0951c
C.jejuni Cj0246cB.subtilis gi2633374
T.maritima TM0014 V.cholerae VC1403
V.cholerae VCA1088T.pallidum gi3322777
T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522
T.pallidum gi3322296B.burgdorferi gi2688521
T.maritima TM0429T.maritima TM0918T.maritima TM0023
T.maritima TM1428T.maritima TM1143
T.maritima TM1146 P.abyssi PAB1308
P.horikoshii gi3256846 P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884
D.radiodurans DRA00354D.radiodurans DRA0353
D.radiodurans DRA0352V.cholerae VC1394
P.abyssi PAB1189P.horikoshii gi3258414
B.burgdorferigi 2688621M.tuberculosis gi1666149
V.cholerae VC0622
**
**
***
**
****
****
***
****
**
*
****
**
**
**
****
**
****
*
****
** ****
***
*
**
***
****
**
*
**
**
*
Figure 6 Phylogenetic tree of methyl-accepting chemotactic proteins (MCP) homologues
in completed genomes. Homologues of MCP were identi®ed by FASTA3 searches of all
available complete genomes. Amino-acid sequences of the proteins were aligned using
CLUSTALW, and a neighbour-joining phylogenetic tree was generated from the alignment
using the PAUP* program (using a PAM-based distance calculation). Hypervariable
regions of the alignment and positions with gaps in many of the sequences were excluded
from the analysis. Nodes with signi®cant bootstrap values are indicated: two asterisks,
.70%; asterisk, 40±70%.
© 2000 Macmillan Magazines Ltd
other copy of the 20-bp site, which is presumably the target forintegration of the island onto the chromosome10,11. It has beenproposed that the TCP/ACF island corresponds to the genome of a®lamentous phage that uses TCP pilin as a coat protein4. However,other than the three genes encoding phage-related proteins (that is,the helicase, transcriptional activator and integrase) we could ®ndno other genes on the island that encoded products with signi®canthomology to the conserved gene products of other ®lamentousphages or the structural proteins of non®lamentous phages.
The maltose-sensitive haemagglutinin (MSHA) is unique to theEl Tor biotype of V. cholerae. Initially characterized as a haemag-glutinin, it was later found to be a type IV pilus42,43. The MSHAbiogenesis (MshHIJKLMNEGF) and structural (MshBACD) pro-teins are all clustered on chromosome 1. There are no apparentintegrases or transposases that might de®ne this region as apathogenicity island or suggest an origin for it other than V.cholerae. In support of this conclusion, trinucleotide compositionanalysis shows that this region has similar composition to the rest ofthe chromosome, suggesting that if these genes were acquired it wasvery early in the Vibrio phylogenetic history. Recently, severalinvestigators have reported that MSHA is not required for intestinalcolonization, nor does it seem to appreciably affect the ef®ciency ofcolonization44±46, but instead plays a role in bio®lm formation27,28.Accordingly, this pilus may be important for the environmental®tness of Vibrio species rather than for pathogenic potential.
The pilA region of V. cholerae genome apparently encodes a thirdtype IV pilus, although it has not been visualized47. This gene clusterincludes a gene encoding a prepilin peptidase (PilD) that is requiredfor the ef®cient processing of protein complexes with type IVprepilin-like signal sequences including TCP, MSHA and EPS47,48.The EPS system of V. cholerae encodes a type II secretion systeminvolved in extracellular export of CT and other proteins. The EPSsystem is encoded by chromosome 1 but, like the MSHA genes,trinucleotide analysis suggests that the EPS genes of V. cholerae havenot been recently acquired. In contrast, trinucleotide compositionanalysis suggests that the pilA gene cluster was acquired by hori-zontal transfer. Thus, analysis of the V. cholerae genome sequenceprovides some evidence that older gene clusters, like MSHA andEPS, have become dependent on newly acquired genes such as PilD.
ConclusionsThe Vibrio cholerae genome sequence provides a new starting pointfor the study of this organism's environmental and pathobiologicalcharacteristics. It will be interesting to determine the gene expres-sion patterns that are unique to its survival and replication duringhuman infection35 as well as in the environment13,14,16. Additionally,the genomic sequence of V. cholerae should facilitate the study ofthis model multi-chromosomal prokaryotic organism. Compara-tive genomics between several species in the genus Vibrio willprovide a better understanding of the origin of the new smallchromosome and the role that it plays in Vibrio biology. Thegenome sequence may also provide important clues to understand-ing the metabolic and regulatory networks that link genes on thetwo chromosomes. Finally, V. cholerae clearly represents a promis-ing genetic system for studying how several horizontally acquiredloci located on separate chromosomes can still ef®ciently interact atthe regulatory, cell biology and biochemical levels. M
MethodsWhole-genome random sequencing procedure.
Vibrio cholerae N16961 was grown from a single isolated colony. Cloning, sequencing andassembly were as described for genomes sequenced by TIGR18. One small-insert plasmidlibrary (2±3 kb) was generated by random mechanical shearing of genomic DNA. Onelarge insert library was ligated into l-DASHII/EcoRI vector (Stratagene). In the initialsequence phase, approximately sevenfold sequence coverage was achieved with 49,633sequences from plasmid clones. Sequences from both ends of 383 l-clones served as agenome scaffold, verifying the orientation, order and integrity of the contigs. The plasmidand l sequences were jointly assembled using TIGR Assembler. Sequence gaps were closed
by editing the end sequences and/or primer walking on plasmid clones. Physical gaps wereclosed by direct sequencing of genomic DNA, or combinatorial polymerase chain reaction(PCR) followed by sequencing the PCR product. The ®nal genome sequence is based on51,164 sequences.
ORF prediction and gene family identi®cation. An initial set of ORFs, likely to encodeproteins, was identi®ed with GLIMMER49, and those shorter than 30 codons wereeliminated. ORFs that overlapped were visually inspected, and in some cases removed.ORFs were searched against a non-redundant protein database18. Frameshifts and pointmutations were detected and corrected where appropriate. Remaining frameshifts andpoint mutations are considered to be authentic and were annotated as `authentic frame-shift' or `authentic point mutation'. ORFs were also analysed with two sets of hiddenMarkov models (HMMs) constructed for a number of conserved protein families (1,313from Pfam v3.1 (ref. 50) and 476 from the TIGRFAM) by use of the HMMER package.TopPred was used to identify membrane-spanning domains in proteins.
Paralogous gene families were constructed by searching the ORFs against themselvesusing BLASTX, identifying matches with E # 10-5 over 60% of the query search length,and subsequently clustering these matches into multigene families. Multiple alignmentsfor these protein families were generated with the CLUSTALW program and thealignments scrutinized.
Distribution of all 64 trinucleotides (3-mers) for each chromosome was determined,and the 3-mer distribution in 2,000-bp windows that overlapped by half their length(1,000 bp) across the genome was computed. For each window, we computed the x2
statistic on the difference between its 3-mer content and that of the whole chromosome. Alarge value of this statistic indicates that the 3-mer composition in this window is differentfrom the rest of the chromosome. Probability values for this analysis are based on theassumption that the DNA composition is relatively uniform throughout the genome.Because this assumption may be incorrect, we prefer to interpret high x2 values merely asindicators of regions on the chromosome that appear unusual and demand further scrutiny.
Homologues of the genes of interest were identi®ed using the BLASTP and FASTA3search programs. All homologues were then aligned to each other using the CLUSTALWprogram with default settings. Phylogenetic trees were generated from the alignmentsusing the neighbour-joining algorithm as implemented by the PAUP* program (with aPAM matrix based distance calculation). Regions of the alignment that were hypervariableor were of low con®dence were excluded from the phylogenetic analysis. All alignments areavailable upon request.
Received 3 April; accepted 18 May 2000.
1. Wachsmuth, K., Olsvik, é., Evins, G. M. & Popovic, T. in Vibrio Cholerae And Cholera: Molecular To
Global Perspective (eds Wachsmuth, I. K., Blake, P. A. & Olsvik, é.) 357±370 (ASM Press, Washington
DC, 1994).
2. Faruque, S. M., Albert, M. J. & Mekalanos, J. J. Epidemiology, genetics, and ecology of toxigenic Vibrio
cholerae. Microbiol. Mol. Biol. Rev. 62, 1301±1314 (1998).
3. Waldor, M. K. & Mekalanos, J. J. Lysogenic conversion by a ®lamentous phage encoding cholera toxin.
Science 272, 1910±1914 (1996).
4. Karaolis, D. K., Somara, S., Maneval, D. R. Jr, Johnson, J. A. & Kaper, J. B. A bacteriophage encoding a
pathogenicity island, a type-IV pilus and a phage receptor in cholera bacteria. Nature 399, 375±379
(1999).
5. Brown, R. C. & Taylor, R. K. Organization of tcp, acf, and toxT genes within a ToxT-dependent
operon. Mol. Microbiol. 16, 425±439 (1995).
6. Hochhut, B. & Waldor, M. K. Site-speci®c integration of the conjugal Vibrio cholerae SXTelement into
prfC. Mol. Microbiol. 32, 99±110 (1999).
7. Yildiz, F. H. & Schoolnik, G. K. Vibrio cholerae O1 El Tor: identi®cation of a gene cluster required for
the rugose colony type, exopolysaccharide production, chlorine resistance, and bio®lm formation.
Proc. Natl Acad. Sci. USA 96, 4028±4033 (1999).
8. Bik, E. M., Bunschoten, A. E., Gouw, R. D. & Mooi, F. R. Genesis of the novel epidemic Vibrio cholerae
O139 strain: evidence for horizontal transfer of genes involved in polysaccharide synthesis. EMBO J
14, 209±216 (1995).
9. Waldor, M. K., Colwell, R. & Mekalanos, J. J. The Vibrio cholerae O139 serogroup antigen includes an
O-antigen capsule and lipopolysaccharide virulence determinants. Proc. Natl Acad. Sci. USA 91,
11388±11392 (1994).
10. Kovach, M. E., Shaffer, M. D. & Peterson, K. M. A putative integrase gene de®nes the distal end of a
large cluster of ToxR-regulated colonization genes in Vibrio cholerae. Microbiology 142, 2165±2174
(1996).
11. Karaolis, D. K. et al. A Vibrio cholerae pathogenicity island associated with epidemic and pandemic
strains. Proc. Natl Acad. Sci. USA 95, 3134±3139 (1998).
12. Mekalanos, J. J., Rubin, E. J. & Waldor, M. K. Cholera: molecular basis for emergence and
pathogenesis. FEMS Immunol. Med. Microbiol. 18, 241±248 (1997).
13. Colwell, R. R. Global climate and infectious disease: the cholera paradigm. Science 274, 2025±2031
(1996).
14. Colwell, R. R. & Spira, W. M. in Cholera (eds Barua, D. & Greenough, W. B. III) 107±127 (Plenum
Medical, New York, 1992).
15. Lobitz, B. et al. Climate and infectious disease: use of remote sensing for detection of Vibrio cholerae by
indirect measurement. Proc. Natl Acad. Sci. USA 97, 1438±1443 (2000).
16. Colwell, R. R. & Huq, A. Environmental reservoir of Vibrio cholerae. The causative agent of cholera.
Ann. NY Acad. Sci. 740, 44±54 (1994).
17. Roszak, D. B. & Colwell, R. R. Survival strategies of bacteria in the natural environment. Microbiol.
Rev. 51, 365±379 (1987).
18. Fraser, C. M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390,
580±586 (1997).
19. Yamaichi, Y., Iida, T., Park, K. S., Yamamoto, K. & Honda, T. Physical and genetic map of the genome
of Vibrio parahaemolyticus: presence of two chromosomes in Vibrio species. Mol. Microbiol. 31, 1513±
1521 (1999).
20. Trucksis, M., Michalski, J., Deng, Y. K. & Kaper, J. B. The Vibrio cholerae genome contains two unique
circular chromosomes. Proc. Natl Acad. Sci. USA 95, 14464±14469 (1998).
articles
482 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com© 2000 Macmillan Magazines Ltd
21. Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13,
660±665 (1996).
22. Rowe-Magnus, D. A., Guerout, A. M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641±651
(1999).
23. Hall, R. M., Brookes, D. E. & Stokes, H. W. Site-speci®c insertion of genes into integrons: role of the
59-base element and determination of the recombination cross-over point. Mol. Microbiol. 5, 1941±
1959 (1991).
24. Davis, B. M., Kimsey, H. H., Chang, W. & Waldor, M. K. The Vibrio cholerae O139 Calcutta
bacteriophage CTXphi is infectious and encodes a novel repressor. J. Bacteriol. 181, 6779±6787
(1999).
25. Mekalanos, J. J. Duplication and ampli®cation of toxin genes in Vibrio cholerae. Cell 35, 253±263
(1983).
26. Mazel, D., Dychinco, B., Webb, V. A. & Davies, J. A distinctive class of integron in the Vibrio cholerae
genome. Science 280, 605±608 (1998).
27. Watnick, P. I. & Kolter, R. Steps in the development of a Vibrio cholerae El Tor bio®lm. Mol. Microbiol.
34, 586±595 (1999).
28. Watnick, P. I., Fullner, K. J. & Kolter, R. A role for the mannose-sensitive hemagglutinin in bio®lm
formation by Vibrio cholerae El Tor. J. Bacteriol. 181, 3606±3609 (1999).
29. Huq, A. et al. Ecological relationships between Vibrio cholerae and planktonic crustacean copepods.
Appl. Environ. Microbiol. 45, 275±283 (1983).
30. Bassler, B. L., Yu, C., Lee, Y. C. & Roseman, S. Chitin utilization by marine bacteria. Degradation
and catabolism of chitin oligosaccharides by Vibrio furnissii. J. Biol. Chem. 266, 24276±24286
(1991).
31. Yildiz, F. H. & Schoolnik, G. K. Role of rpoS in stress survival and virulence of Vibrio cholerae. J.
Bacteriol. 180, 773±784 (1998).
32. Bassler, B. L., Greenberg, E. P. & Stevens, A. M. Cross-species induction of luminescence in the
quorum-sensing bacterium Vibrio harveyi. J. Bacteriol. 179, 4043±4045 (1997).
33. Williams, S. G., Attridge, S. R. & Manning, P. A. The transcriptional activator HlyU of Vibrio cholerae:
nucleotide sequence and role in virulence gene expression. Mol. Microbiol. 9, 751±760 (1993).
34. Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins, and processes.
Mutat. Res. 435, 171±213 (1999).
35. Lee, S. H., Hava, D. L., Waldor, M. K. & Camilli, A. Regulation and temporal expression patterns of
Vibrio cholerae virulence genes during infection. Cell 99, 625±634 (1999).
36. Lin, W. et al. Identi®cation of a Vibrio cholerae RTX toxin gene cluster that is tightly linked to the
cholera toxin prophage. Proc. Natl Acad. Sci. USA 96, 1071±1076 (1999).
37. Wu, Z., Milton, D., Nybom, P., Sjo, A. & Magnusson, K. E. Vibrio cholerae hemagglutinin/protease
(HA/protease) causes morphological changes in cultured epithelial cells and perturbs their para-
cellular barrier function. Microb. Pathog. 21, 111±123 (1996).
38. Alm, R. A., Stroeher, U. H. & Manning, P. A. Extracellular proteins of Vibrio cholerae: nucleotide
sequence of the structural gene (hlyA) for the haemolysin of the haemolytic El Tor strain 017 and
characterization of the hlyA mutation in the non- haemolytic classical strain 569B. Mol. Microbiol. 2,
481±488 (1988).
39. O'Brien, A. D., Chen, M. E., Holmes, R. K., Kaper, J. & Levine, M. M. Environmental and human
isolates of Vibrio cholerae and Vibrio parahaemolyticus produce a Shigella dysenteriae 1 (Shiga)-like
cytotoxin. Lancet 1, 77±78 (1984).
40. Ogawa, A., Kato, J., Watanabe, H., Nair, B. G. & Takeda, T. Cloning and nucleotide sequence of a heat-
stable enterotoxin gene from Vibrio cholerae non-O1 isolated from a patient with traveler's diarrhea.
Infect. Immun. 58, 3325±3329 (1990).
41. Manning, P. A. The tcp gene cluster of Vibrio cholerae. Gene 192, 63±70 (1997).
42. Jonson, G., Holmgren, J. & Svennerholm, A. M. Identi®cation of a mannose-binding pilus on Vibrio
cholerae El Tor. Microb. Pathog. 11, 433±441 (1991).
43. Jonson, G., Lebens, M. & Holmgren, J. Cloning and sequencing of Vibrio cholerae mannose-sensitive
haemagglutinin pilin gene: localization of mshA within a cluster of type 4 pilin genes. Mol. Microbiol.
13, 109±118 (1994).
44. Attridge, S. R., Manning, P. A., Holmgren, J. & Jonson, G. Relative signi®cance of mannose-sensitive
hemagglutinin and toxin-coregulated pili in colonization of infant mice by Vibrio cholerae El Tor.
Infect. Immun. 64, 3369±3373 (1996).
45. Tacket, C. O. et al. Investigation of the roles of toxin-coregulated pili and mannose- sensitive
hemagglutinin pili in the pathogenesis of Vibrio cholerae O139 infection. Infect. Immun. 66, 692±695
(1998).
46. Thelin, K. H. & Taylor, R. K. Toxin-coregulated pilus, but not mannose-sensitive hemagglutinin, is
required for colonization by Vibrio cholerae O1 El Tor biotype and O139 strains. Infect. Immun. 64,
2853±2856 (1996).
47. Fullner, K. J. & Mekalanos, J. J. Genetic characterization of a new type IV-A pilus gene cluster found in
both classical and El Tor biotypes of Vibrio cholerae. Infect. Immun. 67, 1393±1404 (1999).
48. Marsh, J. W. & Taylor, R. K. Identi®cation of the Vibrio cholerae type 4 prepilin peptidase required for
cholera toxin secretion and pilus formation. Mol. Microbiol. 29, 1481±1492 (1998).
49. Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identi®cation using interpolated
Markov models. Nucleic Acids Res. 26, 544±548 (1998).
50. Bateman, A. et al. Pfam 3.1: 1313 multiple alignments and pro®le HMMs match the majority of
proteins. Nucleic Acids Res. 27, 260±262 (1999).
Supplementary information is available on Nature's World-Wide Web site (http://www.nature.com) or as paper copy from the London editorial of®ce of Nature.
Acknowledgements
This work was supported by the National Institutes of Health, National Institute of Allergyand Infectious Disease. We thank M. Heaney, V. Sapiro, B. Lee, M. Holmes and B. Vincentfor database and software support.
Correspondence and requests for materials should be addressed to C.M.F.(e-mail: [email protected]). The annotated genome sequence and the gene family alignmentsare available at hhttp://www.tigr.org/tbd/mdbi. The sequences have been deposited inGenBank with accession number AE003852 (chromosome 1) and AE003853 (chromo-some 2).
articles
NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 483© 2000 Macmillan Magazines Ltd
articles
484 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com
56
67
1112
910
611
6
58
146
66
95
1911
95
1313
914
913
1212
55
1912
5
57
55
98
913
68
117
69
119
117
69
1113
125
86
56
69
99
126
712
125
912
119
10
59
75
139
1011
56
66
88
86
59
512
1211
87
115
610
910
59
1213
1211
65
1110
13
712
77
88
614
1210
66
67
125
59
612
56
89
13
145
67
115
1110
1011
1011
11
117
59
65
75
58
12
1210
1010
89
6
126
510
512
511
712
911
1213
1013
86
116
156
77
88
65
77
712
810
712
126
58
1110
109
99
66
910
611
66
105
125
810
613
118
512
59
1211
611
611
125
2
5
48
54
18
2
1
4
4
2
1
1
2
161
12
1
14
3
16S
b
16S
d
16S
f
16S
g16
Sh
23S
a
23S
b
23S
c
23S
d
23S
e
23S
f
23S
g23
Sh
5Sb
5Sc
5Sd
5Sg
5Sh
RN
aseP
RN
A
SR
PR
NA
tmR
NA
59
913
68
811
75
1011
1111
99
128
610
177
1222
109
911
1212
59
910
12
115
1010
126
67
511
8
59
66
1110
1210
1210
1314
128
916
58
98
512
1210
56
911
5
59
86
139
65
96
8
85
1014
135
55
95
136
9
812
136
56
1
2
1
Mem
bran
e pr
otei
n
Am
ino
acid
bio
synt
hesi
s
Bio
synt
hesi
s of
cof
acto
rs, p
rost
hetic
gro
ups,
and
car
riers
Cel
l env
elop
e
Cel
lula
r pr
oces
ses
Cen
tral
inte
rmed
iary
met
abol
ism
DN
A m
etab
olis
m
Ene
rgy
met
abol
ism
Fatty
aci
d an
d ph
osph
olip
id m
etab
olis
m
Reg
ulat
ory
func
tions
Tran
scrip
tion
Tran
spor
t and
bin
ding
pro
tein
s
Nuc
leot
ides
Pro
tein
fate
/pro
tein
syn
thes
is
Con
serv
ed h
ypot
hetic
al
No
data
base
mat
ch
21
16S
c2
1
1
16S
a1
1
16S
e1
1
12
34
56
78
910
1213
1415
1618
1920
2122
2324
2627
2829
3031
3233
3435
3637
3940
4142
4344
4546
4748
4950
5152
5455
5657
5860
6162
6364
6566
6768
6970
7172
7374
7576
7879
8081
8283
8485
8687
8889
9091
9293
9495
9798
9910
0
103
105
106
108
111
112
113
114
115
116
117
118
119
120
122
125
126
127
128
129
130
131
132
134
135
136
137
139
140
142
144
146
147
148
149
150
151
152
153
154
156
157
158
159
161
162
164
165
166
167
168
170
171
172
173
174
175
177
178
179
180
181
182
183
184
185
186
187
188
190
191
192
194
194
195
196
198
199
200
201
202
203
204
206
207
208
209
210
211
212
213
214
215
216
217
220
221
222
223
224
225
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
249
250
251
252
254
256
257
258
259
260
262
263
264
265
266
268
267
269
270
271
274
275
276
277
278
280
281
282
283
284
285
286
287
288
289
291
293
295
296
297
298
299
300
302
303
304
305
306
307
308
309
312
315
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
348
349
350
353
354
355
356
357
359
360
361
362
365
366
369
370
371
372
373
374
376
377
378
379
381
384
385
385
386
389
390
391
392
393
394
395
396
397
398
399
400
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
428
429
430
431
432
433
434
437
438
439
440
441
442
443
444
445
446
447
449
450
452
453
454
455
456
457
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
480
481
482
483
485
486
487
488
489
490
491
492
493
494
495
496
498
499
500
501
502
503
505
506
508
509
510
512
513
514
515
516
517
518
519
521
522
523
524
525
526
528
529
530
531
532
533
534
535
537
538
539
540
541
542
543
544
545
547
549
550
551
552
553
554
556
557
558
559
560
562
563
564
565
566
566
567
568
570
571
573
574
575
576
577
578
579
580
582
581
583
585
586
587
589
590
591
592
593
594
595
596
597
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
626
627
628
629
630
631
632
633
634
636
637
638
639
640
641
642
643
644
645
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
667
668
671
672
673
674
675
676
677
678
680
681
682
683
684
685
687
688
689
690
691
692
693
694
695
696
697
698
700
701
702
703
704
705
706
708
709
710
711
714
715
717
718
719
720
721
722
723
724
725
726
727
728
730
731
732
734
736
737
739
741
742
743
744
745
746
747
748
749
750
751
752
753
755
756
757
758
759
760
761
762
763
764
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
783
784
786
787
788
790
791
792
793
794
795
796
798
799
800
801
802
803
804
806
807
809
810
811
812
813
814
815
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
851
852
853
854
855
856
857
858
859
860
861
862
863
864
866
869
870
873
875
876
880
881
884
886
888
889
890
892
893
894
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
913
914
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
933
934
934
935
936
937
938
939
940
941
942
943
944
947
948
949
950
951
953
954
956
957
958
959
960
961
962
963
964
965
968
969
970
971
972
973
974
975
976
977
979
980
981
983
984
985
986
987
988
990
991
992
993
994
995
997
998
999
1000
1001
1002
1003
1004
1005
1006
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1021
1023
1024
1025
1026
1028
1029
1031
1033
1034
1035
1037
1036
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1053
1054
1056
1058
1059
1060
1061
1062
1063
1064
1066
1067
1069
1070
1071
1073
1074
1075
1076
1077
1079
1081
1083
1084
1085
1086
1087
1088
1089
1091
1092
1093
1094
1095
1096
1097
1098
1099
1101
1102
1103
1104
1105
1106
1107
1108
1110
1111
1112
1113
1114
1115
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1144
1145
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1188
1190
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1209
1210
1211
1212
1213
1214
1215
1216
1217
1219
1220
1224
1226
1228
1229
1231
1232
1234
1235
1236
1237
1238
1239
1240
1242
1244
1245
1246
1248
1249
1250
1252
1253
1255
1256
1257
1258
1259
1260
1261
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1282
1284
1285
1286
1287
1288
1289
1290
1291
1293
1295
1296
1297
1298
1300
1301
1302
1303
1304
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1317
1318
1319
1320
1321
1323
1325
1327
1328
1329
1330
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1349
1348
1350
1353
1354
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1369
1370
1371
1372
1373
1374
1375
1376
1377
1379
1382
1384
1386
1388
1390
1391
1392
139
413
9713
9914
0014
0114
0214
0314
0514
06
1407
1408
1409
1410
1411
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1437
1438
1439
1441
1442
1444
1445
1446
1447
1448
1450
1451
1454
1457
1458
1460
1463
1465
1467
1469
1470
1473
1475
1476
1478
1482
1481
1483
1485
1486
1488
1490
1491
1492
1492
1494
1495
1496
1497
1499
1498
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1511
1512
1513
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1529
1532
1534
1533
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1557
1558
1560
1561
1562
1563
1565
1566
1567
1568
1570
1571
1573
1575
1577
1579
1580
1581
1582
1583
1584
1585
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1614
1615
1617
1618
1619
1620
1621
1622
1623
1624
1625
1627
1628
1629
1630
1631
1632
1633
1634
1636
1635
1638
1639
1641
1643
1644
1645
1647
1649
1650
1651
1652
1653
1655
1656
1658
1659
1660
1663
1664
1665
1666
1667
1669
1670
1671
1672
1673
1674
1675
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1692
1693
1695
1697
1698
1700
1701
1703
1704
1706
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1725
1726
1727
1730
1732
1735
1736
1738
1740
1741
1743
1745
1746
1748
1749
1750
1751
1753
1754
1755
1756
1757
1758
1759
1760
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1781
1782
1783
1784
1786
1788
1789
1791
1794
1797
1798
1799
1800
1803
1805
1806
1807
1808
1811
1814
1815
1817
1819
1820
1821
1822
1824
1825
1826
1827
1828
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1877
1878
1880
1879
1882
1883
1884
1885
1886
1887
1888
1889
1890
1892
1893
1894
1896
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1909
1910
1911
1912
1915
1916
1918
1920
1921
1922
1923
1925
1926
1927
1928
1929
1931
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1944
1945
1947
1949
1950
1951
1952
1953
1955
1956
1959
1960
1962
1963
1964
1965
1966
1967
1968
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1981
1980
1983
1984
1985
1986
1987
1989
1991
1990
1992
1993
1994
1995
1997
1998
2000
2001
2002
2003
2004
2006
2007
2008
2009
2011
2012
2013
2014
2015
2016
2017
2018
2019
2021
2022
2023
2024
2026
2027
2028
2030
2031
2032
2033
2035
2036
2037
2038
2039
2041
2042
2043
2045
2046
2047
2048
2049
2051
2052
2053
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2066
2067
2068
2069
2070
2072
2073
2074
2075
2077
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2092
2093
2094
2095
2096
2097
2099
2100
2101
2103
2106
2107
2108
2109
2110
2111
2113
2115
2116
2118
2119
2120
2121
2123
2126
2127
2128
2129
2130
2131
2132
2133
2135
2136
2137
2140
2141
2142
2143
2144
2145
2146
2152
2153
2156
2157
2159
2160
2161
2162
2164
2166
2167
2168
2171
2172
2174
2175
2176
2178
2179
2180
2181
2182
2183
2184
2185
2187
2188
2190
2191
2192
2193
2194
2195
2196
2197
2198
2201
2202
2203
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2220
2222
2223
2224
2225
2226
2227
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2240
2242
2244
2245
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2276
2277
2278
2279
2280
2281
2282
2283
2285
2286
2287
2289
2290
2291
2292
2293
2294
2295
2297
2298
2299
2300
2301
2302
2303
2305
2307
2308
2309
2310
2311
2312
2316
2319
2320
2322
2323
2324
2329
2330
2332
2333
2334
2335
2337
2338
2339
2340
2341
2342
2343
2344
2344
2345
2346
2347
2348
2349
2350
2352
2353
2355
2356
2358
2359
2360
2362
2363
2364
2365
2366
2368
2369
2370
2371
2373
2374
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2409
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2425
2426
2427
2428
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2450
2451
2452
2453
2454
2456
2458
2459
246
024
6124
6224
6324
6424
6524
6624
6724
6924
7024
7224
7424
7524
7624
7924
8024
8124
8224
8324
8424
8524
8724
8824
8924
9024
9124
9224
9324
9424
9724
9824
9925
0025
0125
0225
0325
0425
0525
0625
0725
0825
10
2510
2511
2513
2514
2517
2518
2519
2520
2522
2523
2524
2525
2527
2528
2529
2531
2532
2534
2535
2536
2537
2538
2539
2541
2542
2544
2545
2547
2548
2549
2550
255
225
5325
5425
5525
5725
5825
5925
6025
6125
6225
6325
6425
6525
6625
6725
6825
6925
7125
7225
7625
7725
7925
8125
8425
9025
9325
9525
9625
9825
9926
0026
0126
0226
0326
0426
0626
0726
0826
0926
1026
1326
1426
1526
1626
1726
1826
1926
2026
2126
2226
2326
2426
2526
2626
2726
28
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2641
2642
2643
2644
2645
2646
2647
2649
2651
2653
2654
2655
2656
2657
2660
2661
2662
2664
2666
2668
267
026
7126
7226
7326
7426
7526
7626
7726
7826
8126
8326
8426
8526
8826
8926
9026
9126
9226
9326
9426
9526
9626
9726
9826
9927
0027
0127
0227
0327
0527
0627
0827
1027
1127
1227
1327
1427
1527
1627
1727
1927
1827
2027
2127
2227
2327
2427
2527
26
2727
2729
2730
2731
2732
2733
2734
2736
2738
2739
2740
2741
2742
2743
2744
2746
2747
2748
2749
2750
2751
2755
2756
2757
2758
2759
2760
2761
2762
2764
2765
2766
2767
2768
2770
2772
2773
2774
2775
12
34
56
78
910
1113
1415
1617
1819
2021
2223
2425
2627
2829
3031
3233
3435
3637
3839
4041
4243
4445
4748
4950
5152
5355
5657
5859
6163
6465
6667
6870
7172
7374
7576
7778
7980
8283
8485
8688
90
9192
9495
9697
9899
100
101
102
103
104
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
127
128
129
130
131
132
133
134
135
136
137
139
140
141
142
144
146
147
148
150
151
152
154
155
156
157
159
160
161
164
165
167
168
169
170
171
172
173
174
175
176
178
179
180
181
181
182
183
186
189
190
191
192
193
194
195
196
197
198
199
200
201
202
204
205
207
208
210
211
212
213
214
217
218
219
220
221
222
223
225
227
228
229
230
231
232
233
235
237
238
239
240
241
242
243
244
245
246
247
248
249
250
253
254
255
256
257
258
261
262
263
264
265
266
267
268
269
270
271
272
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
290
291
293
294
299
300
301
303
307
308
309
310
311
313
316
317
318
322
325
328
330
331
334
337
338
340
341
344
345
346
347
349
350
351
354
355
356
358
360
361
363
365
366
367
368
36937
037
137
237
437
537
637
938
038
238
538
738
839
139
539
639
739
940
040
140
240
540
640
840
941
041
341
441
541
641
741
942
042
142
342
442
542
843
143
243
343
643
944
044
144
244
3
447
448
449
450
451
453
454
455
457
458
459
460
463
464
465
468
470
472
474
475
476
479
481
483
485
486
487
490
491
493
495
496
498
501
504
505
506
507
508
510
511
512
513
514
516
517
518
519
520
521
522
523
524
526
527
528
529
530
531
532
534
535
536
537
538
539
540
542
543
544
545
546
549
550
552
554
556
557
558
559
560
561
563
564
565
566
567
568
571
572
573
574
575
576
578
580
581
582
583
584
585
586
587
588
589
590
591
592
593
595
596
599
600
601
602
603
604
605
606
607
608
610
612
614
615
616
617
618
619
620
621
623
624
625
627
628
629
630
631
632
633
634
635
637
638
639
640
641
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
661
662
663
663
665
666
667
669
671
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
697
699
700
702
703
704
705
706
707
708
710
709
711
712
714
716
717
718
719
720
723
724
725
726
727
728
729
730
732
734
735
736
737
738
740
741
744
745
747
748
749
751
752
753
754
756
757
758
759
760
762
763
764
765
766
767
768
769
772
773
774
776
777
778
779
780
781
783
782
784
785
786
788
789
790
791
792
793
795
798
799
800
801
802
803
804
805
806
807
808
809
811
812
813
814
815
817
818
820
822
823
824
825
827
828
829
830
832
833
834
835
836
837
838
840
843
846
847
848
849
850
851
852
853
854
855
856
859
860
862
863
864
865
867
870
871
872
873
875
876
877
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
917
920
921
922
923
924
925
926
927
928
929
930
931
935
936
938
937
939
940
941
943
944
945
946
947
949
951
952
954
955
956
957
958
960
961
962
963
964
965
969
971
972
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
996
997
998
999
1000
1001
1002
1004
1005
1006
1008
1010
1011
1012
1013
1015
1017
1018
1019
1020
1023
1025
1026
1027
1028
1029
1031
1033
1034
1036
1037
1038
1039
1040
1041
1043
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1062
1063
1067
1068
1069
1071
1072
1073
1074
1075
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1088
1089
1090
1091
1092
1093
1094
1095
1098
1099
1100
1101
1102
1104
1105
1108
1109
1110
1111
1112
1113
1114
1115
16s
rRN
A
23s
rRN
A
5s r
RN
A
3
Rep
eat r
egio
n
Thr
ee tR
NA
s
Rho
-ind.
term
inat
orO
ther
cat
egor
ies
1 K
b
Oth
er R
NA
© 2000 Macmillan Magazines Ltd