articles DNA sequence of both chromosomes of the cholera...

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 477

articles

DNA sequence of both chromosomes ofthe cholera pathogen Vibrio choleraeJohn F. Heidelberg*, Jonathan A. Eisen*, William C. Nelson*, Rebecca A. Clayton, Michelle L. Gwinn*, Robert J. Dodson*, Daniel H. Haft*,Erin K. Hickey*, Jeremy D. Peterson*, Lowell Umayam*, Steven R. Gill*, Karen E. Nelson*, Timothy D. Read*, HerveÂ Tettelin*,Delwood Richardson*, Maria D. Ermolaeva*, Jessica Vamathevan*, Steven Bass*, Haiying Qin*, Ioana Dragoi*, Patrick Sellers*,Lisa McDonald*, Teresa Utterback*, Robert D. Fleishmann*, William C. Nierman*, Owen White *, Steven L. Salzberg*,Hamilton O. Smith*², Rita R. Colwell³, John J. Mekalanos§, J. Craig Venter*² & Claire M. Fraser*

* The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA³ Center of Marine Biotechnology, University of Maryland Biotechnology Institute, 701 East Pratt Street, Baltimore, Maryland 21202, USA, and Department of Cell and

Molecular Biology, University of Maryland, College Park, Maryland 20742, USA

§ Harvard Medical School, Department of Microbiology and Molecular Genetics, 200 Longwood Avenue, Boston, Massachusetts 02115, USA

............................................................................................................................................................................................................................................................................

Here we determine the complete genomic sequence of the Gram negative, g-Proteobacterium Vibrio cholerae El Tor N16961 to be4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that togetherencode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication,transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) arelocated on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genescompared with the large chromosome (42%), and also contains many more genes that appear to have origins other than theg-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host `addiction' genes thatare typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by anancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living,environmental organism emerged to become a signi®cant human bacterial pathogen.

Vibrio cholerae is the aetiological agent of cholera, a severe diar-rhoeal disease that occurs most frequently in epidemic form1.Cholera has been epidemic in southern Asia for at least 1,000years, but also spread worldwide to cause seven pandemics since1817 (ref. 1). When untreated, cholera is a disease of extraordinarilyrapid onset and potentially high lethality. Although clinical man-agement of cholera has advanced over the past 40 years, choleraremains a serious threat in developing countries where sanitation ispoor, health care limited, and drinking water unsafe.

Vibrio cholerae as a species includes both pathogenic andnonpathogenic strains that vary in their virulence gene content2.This bacterium contains a wide variety of strains and biotypes,receiving and transferring genes for toxins3, colonization factors4,5,antibiotic resistance6, capsular polysaccharides that provide resis-tance to chlorine7 and new surface antigens, such as the 0139lipopolysaccharide and O antigen capsule8,9. The lateral or hori-zontal transfer of these virulence genes by phage3, pathogenicityislands10,11 and other accessory genetic elements12 provides insightsinto how bacterial pathogens emerge and evolve to become newstrains.

Vibrio species represent a signi®cant portion of the culturableheterotrophic bacteria of oceans, coastal waters and estuaries13,14.Environmental studies show that these bacteria strongly in¯uencenutrient cycling in the marine environment. Various species of thisgenus are also devastating pathogens for ®n®sh, shell®sh and mam-mals. There is still much to be learned about the aquatic ecology andnatural history of V. cholerae including its autochthonous (native)existence in endemic locales during cholera-free, interepidemicperiods, which environmental factors, such as climate13,15, aidedits re-emergence in Latin America, and which environmental factorsare associated with its habitat in cholera endemic regions. Forexample, V. cholerae, during interepidemic periods, is an inhabitant

of brackish and estuarine waters, and in these environments isassociated with zooplankton and other aquatic ¯ora and fauna16.The organism also enters a `̀ viable but nonculturable''17 state undercertain conditions. Roles for these environmental interactions andthis dormant physiological state in the emergence and persistence ofpathogenic V. cholerae have been proposed14,17.

Here we report the determination and analysis of the Vibriocholerae genome sequence. This analysis represents an importantstep toward the complete molecular description of how this free-living environmental organism emerged to become a humanpathogen by horizontal gene transfer.

Genome analysisThe genome of V. cholerae was sequenced by the whole genomerandom sequencing method18. The genome consists of two circularchromosomes19,20 of 2,961,146 (chromosome 1) and 1,072,314

² Present address: Celera Genomics, 45 West Gude Drive, Rockville, Maryland 20850, USA

Table 1 General features of the Vibrio cholerae genome

Chromosome 1 Chromosome 2

Size (bp) 2,961,151 1,072,914Total number of sequences 36,797 14,367G+C percentage 47.7 46.9Total number of ORFs 2,770 1,115ORF size (bp) 952 918Percentage coding 88.6 86.3Number of rRNA operons (16S-23S-5S) 8 0Number of tRNA 94 4Number similar to known proteins 1,614 (58%) 465 (42%)Number similar to proteins of unknown function* 163 (6%) 66 (6%)Number of conserved hypothetical proteins² 478 (17%) 165 (15%)Number of hypothetical proteins³ 515 (19%) 419 (38%)Number of Rho-independent terminators 599 193.............................................................................................................................................................................

bp, base pairs. ORFs, open reading frames.* Proteins of unknown function, signi®cant sequence similarity (homology) to a named protein forwhich there is currently no known function.² Conserved hypothetical protein, sequence similarity to a translation of another ORF, but there iscurrently no experimental evidence a protein is expressed.³ Hypothetical protein, no signi®cant sequence similarity to another protein.

© 2000 Macmillan Magazines Ltd

(chromosome 2) base pairs, with an average G+C content of 46.9%and 47.7%, respectively. There are a total of 3,885 predicted openreading frames (ORFs) and 792 predicted Rho-independent termi-nators; with 2,770 and 1,115 ORFs and 599 and 193 Rho-indepen-dent terminators on the individual chromosomes (Table 1, Figs 1and 2). Most genes required for growth and viability are located onchromosome 1, although some genes found only on chromosome 2are also thought to be essential for normal cell function (forexample, dsdA, thrS and the genes encoding ribosomal proteinsL20 and L35). Additionally, many intermediaries of metabolicpathways are encoded only on chromosome 2 (Fig. 3).

The replicative origin in chromosome 1 was identi®ed by simi-larity to the Vibrio harveyi and Escherichia coli origins, co-localiza-tion of genes (dnaA, dnaN, recF and gyrA) often found near theorigin in prokaryotic genomes, and GC nucleotide skew (G-C/G+C) analysis21. Based on these, we designated base-pair 1 in anintergenic region that is located in the putative origin of replication.Only the GC skew analysis was useful in identifying a putative originon chromosome 2.

This genomic sequence of V. cholerae con®rmed the presence of alarge integron island (a gene capture system) located on chromo-some 2 (125.3 kbp)22,23. The V. cholerae integron island contains allcopies of the V. cholerae repeat (VCR) sequence and 216 ORFs (Fig.1). However, most of these ORFs have no homology to othersequences. Among the recognizable integron island genes arethree that encode gene products that may be involved in drugresistance (chloramphenicol acetyltransferase, fosfomycin resis-tance protein and glutathione transferase), several DNA metabo-lism enzymes (MutT, transposase, and an integrase), potentialvirulence genes (haemagglutinin and lipoproteins) and threegenes which encode gene products similar to the `host addiction'proteins (higA, higB and doc), which are used by plasmids to selectfor their maintenance by host cells.

Comparative genomicsThe two-chromosome structure of V. cholerae allows for compar-isons, both between the two chromosomes of this organism andbetween either of the V. cholerae chromosomes and the chromo-somes of other microbial species. There is pronounced asymmetryin the distribution of genes known to be essential for growth andvirulence between the two chromosomes. Signi®cantly more genesencoding DNA replication and repair, transcription, translation,cell-wall biosynthesis and a variety of central catabolic and biosyn-thetic pathways are encoded by chromosome 1. Similarly, mostgenes known to be essential in bacterial pathogenicity (that is, thoseencoding the toxin co-regulated pilus, cholera toxin, lipopolysac-charide and the extracellular protein secretion machinery) are alsolocated on chromosome 1. In contrast, chromosome 2 contains alarger fraction (59%) of hypothetical genes and genes of unknownfunction, compared with chromosome 1 (42%) (Fig. 4). Thispartitioning of hypothetical proteins on chromosome 2 is highlylocalized in the integron island (Fig. 2). Chromosome 2 also carriesthe 3-hydroxy-3-methylglutaryl CoA reductase, a gene apparentlyacquired from an archaea (Y. Boucher and W. F. Doolittle, personalcommunication).

The majority of the V. cholerae genes were very similar to E. coligenes (1,454 ORFs), but 499 (12.8%) of the V. cholerae ORFsshowed highest similarity to other V. cholerae genes, suggestingrecent duplications (Figs 5 and 6). Most of the duplicated ORFsencode products involved in regulatory functions (59), chemotaxis(50), transport and binding (42), transposition (18), pathogenicity(13) or unknown functions encoded by conserved hypothetical (62)and hypothetical proteins (113). There are 105 duplications with atleast one of each ORF on each chromosome indicating there havebeen recent crossovers between chromosomes. The extensive dupli-cation of genes involved in scavenging behaviour (chemotaxis andsolute transport) suggests the importance of these gene products in

V. cholerae biology, notably its ability to inhabit diverse environ-ments. These environments, in turn, may have selected the duplica-tion and divergence of genes useful for specialized functions.Additionally, whereas El Tor strain N16961 carries only a singlecopy of the cholera toxin prophage, other V. cholerae strains carryseveral copies of this element24,25, and strains of the classical biotypehave a second copy of the prophage that is localized on chromosome2 (ref. 20). Thus, virulence genes are presumed to be subject toselective pressure, affecting copy number and chromosomal loca-tion.

Several ORFs with apparently identical functions exist on bothchromosomes which were probably acquired by lateral gene trans-fer. For example, glyA (encoding serine hydroxymethyl transferase)is found once on each chromosome but the phylogenetic analysissuggest the glyA copy on chromosome 1 branches with thea-Proteobacteria, whereas the copy on chromosome 2 brancheswith the g-Proteobacteria (see Supplementary Information). Thechromosome 2 glyA is ¯anked by genes encoding transposases,suggesting that this gene was acquired through a transpositionevent.

articles

478 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com

Figure 1 Linear representation of the V. cholerae chromosomes. The location of

the predicted coding regions, colour-coded by biological role, RNA genes, tRNAs, other

RNAs, Rho-independent terminators and Vibrio cholerae repeats (VCRs) are indicated.

Arrows represent the direction of transcription for each predicted coding region. Numbers

next to the tRNAs represent the number of tRNAs at a locus. Numbers next to GES

represent the number of membrane-spanning domains predicted by the Goldman,

Engleman and Steitz scale calculated by TopPred for that protein. Gene names are

available at the TIGR web site (www.tigr.org) and as Supplementary Information.

1100,000

200,000

300,000

400,000

500,000600,000

700,000

800,000

900,000

1,000,000

1 100,000200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

1,100,000

1,200,0001,300,000

1,400,0001,500,0001,600,0001,700,000

1,800,000

1,900,000

2,000,000

2,100,000

2,200,000

2,300,000

2,400,000

2,500,000

2,600,000

2,700,0002,800,000

2,900,000

Figure 2 Circular representation of the V. cholerae genome. The two chromosomes, large

and small, are depicted. From the outside inward: the ®rst and second circles show

predicted protein-coding regions on the plus and minus strand, by role, according to the

colour code in Fig. 1 (unknown and hypothetical proteins are in black). The third circle

shows recently duplicated genes on the same chromosome (black) and on different

chromosomes (green). The fourth circle shows transposon-related (black), phage-related

(blue), VCRs (pink) and pathogenesis genes (red). The ®fth circle shows regions with

signi®cant x2 values for trinucleotide composition in a 2,000-bp window. The sixth circle

shows percentage G+C in relation to mean G+C for the chromosome.The seventh and

eighth circles are tRNAs and rRNAs, respectively.

Q


Origin and function of the small chromosome of V. choleraeSeveral lines of evidence suggest that chromosome 2 was originally amegaplasmid captured by an ancestral Vibrio species. The phyloge-netic analysis of the ParA homologues located near the putativeorigin of replication of each chromosome shows chromosome 1ParA tending to group with other chromosomal ParAs, and theParA from chromosome 2 tending to group with plasmid, phageand megaplasmid ParAs (see Supplementary Information). Ingeneral, genes on chromosome 2, with an apparently identicalfunctioning copy on chromosome 1, appear less similar to ortho-logues present in other g-Proteobacteria species (see Supplemen-tary Information). Also, chromosome 1 contains all the ribosomalRNA operons and at least one copy of all the transfer RNAs (fourtRNAs are found on chromosome 2, but there are duplicates onchromosome 1). In addition, chromosome 2 carries the integronregion, an element often found on plasmids26. Finally, the bias in thefunctional gene content is more easily explained, if chromosome 2

was originally a megaplasmid (Fig. 4). The megaplasmid presum-ably acquired genes from diverse bacterial species before its captureby the ancestral Vibrio. The relocation of several essential genes fromchromosome 1 to the megaplasmid completed the stable capture ofthis smaller replicon. Apparently this capture of the megaplasmidoccurred long enough ago that the trinucleotide composition andpercentage G+C content between the two chromosomes hasbecome similar (except for laterally moving elements such as theintegron island, bacteriophage genomes, transposons, and so on).The two chromosome structure is found in other Vibrio species19

suggesting that the gene content of the megaplasmid continues toprovide Vibrio with an evolutionary advantage, perhaps within theaquatic ecosystem where Vibrio species are frequently the dominantmicroorganisms14,16.

It is unclear why chromosome 2 has not been integrated intochromosome 1. Perhaps chromosome 2 plays an important special-ized function that provides the evolutionary selective pressure to

articles


PP

PEP

Pyruvate

Acetyl-CoAACETATE acetyl-P

citrate

cis-aconitate

isocitrate

2-oxoglutarate

succinyl-CoA

succinate

fumarate

malate

oxaloaceticacid

glyoxylate

L-GLUTAMATE

GLUCOSEGLYCOLYSIS

GLUCONEOGENESIS

ENTNERDOUDOROFFPATHWAY

NON-OXIDATIVEPENTOSE

PHOSPHATEPATHWAY

Fatty Acid Biosynthesisand Degradation

L-LACTATE

ethanolL- leucine

formate, H2+CO2

PROLINE

serine

ORNITHINEARGININE

ASPARTATE

t

LACTOSE

GLYCOGEN

propionate

L-ALANINESERINEL-TRYPTOPHAN

2-keto-isovalerate

TCA Cycle

ASPARAGINE

ATP ADP+ Pi

ATPsynthase

H+

sugar-Pi

IIBC

IIA

Pi

Pi

Pi

PTS system

ptsI

ptsH

iron (III)(2,2)

vibriobactinfepB/C/D/G

zincznuA/B/C

heminhutB/C/D

uraciluraA

xanthine/uracil family

NMN,pnuC

G3P

ugpA/B/C/EhemeccmA/B/C/D

drugs(14,3)

toxins(4,2)

lipolysaccharide/

O-antigen, rfbH/I

CheA/B/D?/R/

V/W/Y/Z

MCPs (23,20)

>40 flagellar andmotor genes

chemotacticsignals

, potE

e-

ELECTRON TRANSPORT CHAIN

H+ H+ H+

C4- dicarboxylatedctP/Q/M, dcuA/B/C

benzoate, benE

formate ? (1,1)

Na+/dicarboxylate(1,1)

Na+/citrate, citS

oxalate/formate ?

galactosidemglA/B/C

maltosemalE/F/G/K

riboserbsA/B/C/D

L-lactate ?

gluconate ?

sulphate, cysZ

phosphate, nptA ?

sulphate(2,2)

sulphatecysA/P/T/W

phosphate (1,1)pstA/B/C/S

molybdenummodA/B/C

Pho4 family protein

sugar family

AcrB/D/Ffamily (2,1)

cationsE1-E2family (3)

vibriobactinreceptorviuA

potassiumkefB (2?)

Na+/H+(4,2)

Mg2+, mgt,(2,1)

NH4+ ?

potassium, trkA/H

tyrosine, tyrP

tryptophan,mtr

AzlC family

BCCT family

amino acids (3,1)

peptides (3,1)

spermidine/putrescinepotA/B/C/D

oligopeptidesoppA/B/C/D/F

arginineartI/M/P/Q

serine,sdaC (2)

bcaa, brnQ

iron (II), feoA/B

potassium, kup

TonB systemreceptor(1,2)

ExbB(1,1) ExbD(1,1)

TonB(1,1)

porin ?

NupCFamily(2,1)

FLAGELLUM

NadCfamily

arginine/ornithineputrescine/ornithine

cadaverine/lysine?

proton/peptide family

Na+/alanine (3)

proton/glutamate, gltP(1,1)

Na+/glutamate,gltS

Na+/proline,putP

fatty acids, fadL(2,1)

MCP

NAG

sucrose?

trehalose?

cellobiose*

fructose

glucose

mannitol

maltoporinompS

PUTRESCINE

D-alanineL-alanineleucinevaline

GLUTAMINE

irgAhemehutAK+?

mscL

Glyoxylatebypass

MALATE

GALACTOSE

GLYCEROL**

MANNITOLMANNOSE

D-LACTATE

FRUCTOSE

SUCROSE

TREHALOSEN-ACETYLGLUCOSAMINE

CHITINSTARCH

melanin

L- phenylalanineL- tryptophantyrosine

GLUCONATE

PRPP

chorismate

RIBOSE

thiamine?colicinstolA/B/Q/R

glycglpF

G3PglpT

B12?btuB/C/D

MsbA?

sbp

Histidine DegradationPathway

HISTIDINE

urocanate

imidazolepropanoate

formiminoL-glutamate

L-glutamate

fumarate

METHIONINE

L-iso-leucine

propionate

O-succinyl-

homoserineglycine CO2

SERINE

THREONINE

L- cysteine

L-ASPARTATE

diaminopimelic acid

L-lysinecadaverine

Figure 3 Overview of metabolism and transport in V. cholerae. Pathways for energy

production and the metabolism of organic compounds, acids and aldehydes are shown.

Transporters are grouped by substrate speci®city: cations (green), anions (red),

carbohydrates (yellow), nucleosides, purines and pyrimidines (purple), amino acids/

peptides/amines (dark blue) and other (light blue). Question marks associated with

transporters indicate a putative gene, uncertainty in substrate speci®city, or direction of

transport. Permeases are represented as ovals; ABC transporters are shown as composite

®gures of ovals, diamonds and circles; porins are represented as three ovals; the large-

conductance mechanosensitive channel is shown as a gated cylinder; other cylinders

represent outer membrane transporters or receptors; and all other transporters are drawn

as rectangles. Export or import of solutes is designated by the direction of the arrow

through the transporter. If a precise substrate could not be determined for a transporter,

no gene name was assigned and a more general common name re¯ecting the type of

substrate being transported was used. Gene location on the two chromosomes, for both

transporters and metabolic steps, is indicated by arrow colour: all genes located on the

large chromosome (black); all genes located on the small chromosome (blue); all genes

needed for the complete pathway on one chromosome, but a duplicate copy of one or

more genes on the other chromosome (purple); required genes on both chromosomes

(red); complete pathway on both chromosomes (green). (Complete pathways, except for

glycerol, are found on the large chromosome.) Gene numbers on the two chromosomes

are in parentheses and follow the colour scheme for gene location. Substrates underlined

and capitalized can be used as energy sources. PRPP, phosphoribosyl-pyrophosphate;

PEP, phosphoenolpyruvate; PTS, phosphoenolpyruvate-dependant phosphotransferase

system; ATP, adenosine triphosphate; ADP, adenosine diphosphate; MCP, methyl-

accepting chaemotaxis protein; NAG, N-acetylglucosamine; G3P, glycerol-3-phosphate;

glyc, glycerol; NMN, nicotinamide mononucleotide. Asterisk, because V. cholerae does

not use cellobiose, we expect this PTS system to be involved in chitobiose transport.


suppress integration events when they do occur. For example, ifunder some environmental condition there is a difference in copynumber between the chromosomes, then chromosome 2 may haveaccumulated genes that are better expressed at higher or lower copynumber than genes on chromosome 1. A second possibility is that,in response to environmental cues, one chromosome may partitionto daughter cells in the absence of the other chromosome (aberrantsegregation). Such single-chromosome-containing cells would bereplication-defective but still maintain metabolic activity (`drone'cells), and, therefore, be a potential source of `̀ viable, but non-culturable (VBNC)'' cells observed to occur in V. cholerae17. Such`drone' cells may also play a role in V. cholerae bio®lms7,27,28 by, forexample, producing extracellular chitinase, protease and otherdegradative enzymes that enhance survival of cells in a bio®lm,retaining two chromosomes without directly competing with thesecells for nutrients.

Transport and energy metabolismVibrio cholerae has a diverse natural habitat that includes associationwith zooplankton in a sessile stage, a planktonic state in the watercolumn, and the capacity to act as a pathogen within the humangastrointestinal tract. It is, therefore, no surprise that this organismmaintains a large repertoire of transport proteins with broadsubstrate speci®city and the corresponding catabolic pathways toenable it to respond ef®ciently to these different and constantlychanging ecosystems (Fig. 4). Many of the sugar transporter systemsand their corresponding catabolic pathway enzymes are localized ona single chromosome (that is, ribose and lactate transport anddegradation enzymes are contained on chromosome 2, whereas thetrehalose systems reside on chromosome 1). However, many of theother energy metabolism pathways are split (that is, chitin, glyco-lysis, and so on) between the chromosomes (Fig. 3).

In aquatic environments, chitin often represents a source of bothcarbon and nitrogen. This energy source is important for V. choleraeas it is associated with zooplankton, which have a chitinousexoskeleton13,15,29. Vibrio cholerae degrades chitin by a pathwaythat is very similar to that of Vibrio furnissii30. Sequence analysissuggests a phosphenolpyruvate phosphotransferase system (PTS)

for cellobiose transport, but as V. cholerae does not use cellobiose, it ismore likely that this PTS is involved in transport of the structurallysimilar compound, chitobiose, analogous to the situation proposedfor Bourrelia burgdorferi18.

The three anions that are transported by ABC transport systems inV. cholerae are molybdenum, phosphate and sulphate. Molybdenumtransport genes (modA/B/C) are all located on chromosome 2, andmost of the sulphate transport genes are on the large molecule.However, copies of the genes for phosphate transporters are foundin both chromosomes. The genes in these two phosphate transportoperons are different from each other and do not represent a recentduplication; instead, this suggests that one may be an acquired operon.

Interchromosomal regulationSeveral of the regulatory pathways, both for regulation in responseto environmental and pathogenic signals, are divided between thetwo chromosomes. These included pathways for starvation survival,`quorum sensing' and expression of the entertoxigenic haemolysin,HlyA.

During periods of nutrient starvation, V. cholerae, and otherGram-negative bacteria, enter the stationary phase and, later, theviable but nonculturable (VBNC) state14,17. The alternative sigmafactor j38 (rpoS) is required for survival of V. cholerae in theenvironment but not for pathogenicity31, and therefore probablyplays an important role in the initiation of the VBNC state. There isone copy of rpoS, located on chromosome 1, near the oriC. The RpoSregulates expression of several other proteins, including catalase,cyclopropane-fatty-acyl-phospholipid synthase and HA/protease,which are found on both chromosomes31.

Genes involved in `quorum sensing', or cell-density-dependentregulation, also exist on both chromosomes of V. cholerae. Inbioluminescent Vibrio species (notably Vibrio ®scheri and V. har-veyi), quorum sensing is used to control light production. Althoughthis strain of V. cholerae lacks the genes for bioluminescence, it doeshave the genes required for the autoinducer-2 (AI-2) quorum-sensing mechanism32 (luxOPQSU) but this pathway is split betweenthe chromosomes with luxOSU on chromosome 1 and luxPQ onchromosome 2. Similarly, another transcriptional regulatory gene,

articles


0

1

2

3

4

5

6

7

8

9

10

Amino

acid b

iosyn

thes

is*

Purine

s, pyr

imidine

s,

nucle

osides

, and

nuc

leotid

es

Fatty

acid an

d pho

spho

lipid m

etab

olism

Biosyn

thes

is of

cofa

ctor

s,

prosth

etic

grou

ps, an

d carri

ers*

Centra

l inte

rmed

iary m

etab

olism

*

Energ

y met

aboli

sm

Trans

port b

inding

and p

rote

ins

DNA met

aboli

sm*

Trans

criptio

n*

Prote

in sy

nthe

sis*

Prote

in fa

te*

Regula

tory

func

tions

Cell en

velop

e*

Cellula

r pro

cess

es*

Other

cate

gorie

s

Hypot

hetic

al*1

Op

en r

ead

ing

fram

es (%

)

Figure 4 Percentage of total Vibrio cholerae open reading frames (ORFs) in biological

roles compared with other g-Proteobacteria. These were V. cholerae, chromosome 1

(blue); V. cholerae, chromosome 2 (red); Escherichia coli (yellow); Haemophilus in¯uenzae

(pale blue). Signi®cant partitioning (P , 0.01) of biological roles between V. cholerae

chromosomes is indicated with an asterisk, as determined with a x2 analysis.

1, Hypothetical contains both conserved hypothetical proteins and hypothetical proteins,

and is at 1/10 scale compared with other roles.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Aquifex

aeoli

cus

Deinoc

occu

s rad

iodur

ans

Bacillu

s sub

tilis

Myc

oplas

ma g

enita

lium

Myc

obac

teriu

m tu

bercu

losis

Borre

lia b

urgd

orfe

ri

Trepon

ema p

allidum

Chlam

ydia

pneum

oniae

Chlam

ydia

trach

omat

is

Esche

richia

coli

Haem

ophil

us in

fluen

zae

Vibrio ch

olera

e

Neisse

ria m

ening

itidis

Ricket

tsia p

rowaz

ekii

Helico

bacte

r pylo

ri

Campylo

bacte

r jeju

ni

Synec

hocy

stis s

p.

Ther

mot

oga m

aritim

a

Aerop

yrum

per

nix

Archa

eoglo

bus fu

lgidus

Met

hano

cocc

us ja

nnas

chii

Met

hano

bacte

rium

ther

moa

utot

rophic

um

Pyroco

ccus

hor

ikosh

ii

Pyroco

ccus

abys

si

Caeno

rhab

ditis el

egan

s

Sacch

arom

yces

cere

visiae

Rel

ativ

e p

rop

ortio

n of

mos

t si

mila

rop

en r

ead

ing

fram

es

Figure 5 Comparison of the V. cholerae ORFs with those of other completely sequenced

genomes. The sequence of all proteins from each completed genome were retrieved from

NCBI, TIGR and the Caenorhabditis elegans (wormpep16) databases. All V. cholerae ORFs

(large chromosome, blue; small chromosome, red) were searched against all

other genomes with FASTA3. The number of V. cholerae ORFs with greatest similarity

(E # 10-5) are shown in proportion to the total number of ORFs in that genome. There

were no ORFs that were most similar to a Mycoplasma pneumoniae ORF.


hlyU (ref. 33), is located on chromosome 1, while the gene itregulates, hlyA, is located on chromosome 2.

DNA repairVibrio cholerae has genes encoding several DNA-repair and DNA-damage-response pathways, including nucleotide-excision repair,mismatch-excision repair, base-excision repair, AP endonuclease,alkylation transfer, photoreactivation, DNA ligation, and all themajor components of recombination and recombinational repair,including initiation, recombination and resolution34. In addition,homologues of many of the genes involved in the SOS response inE. coli are found. The presence of three photolyase homologues,more than have been found in other bacterial species, probablyallows for the ability to photoreactivate the two major forms ofultraviolet-induced DNA damage (cyclobutane pyrimidine dimersand 6-4 photoproducts), and may also allow use of a range ofwavelengths of light used for the energy required for photoreactiva-tion. It is also of interest that many of the repair genes are onchromosome 2 (alkA, ada1, ada2, phr3, mutK, sbcCD, dcm, mutT3),indicating that this chromosome is probably required for full DNArepair capability.

PathogenicityToxins. The genome sequence of V. cholerae El Tor N16961 revealeda single copy of the cholera toxin (CT) genes, ctxAB, located onchromosome 1 within the integrated genome of CTXf, a temperate®lamentous phage3. The receptor for entry of CTXf into the cell isthe toxin-coregulated pilus (TCP)3, and the TCP gene cluster (seebelow) also resides on chromosome 1. Like the structural genes forCT and TCP, the regulatory gene, toxR, which controls theirexpression in vivo35, is also located on chromosome 1.

On the other side of CTXf prophage is a region encoding an RTXtoxin (rtxA), and its activator (rtxC) and transporters (rtxBD)36. Athird transporter gene has been identi®ed that is a paralogue of rtxB,and is transcribed in the same direction as rtxBD. Downstream ofthis gene are two genes encoding a sensor histidine kinase andresponse regulator. Trinucleotide composition analysis suggests thatthe RTX region was horizontally acquired along with the sensorhistidine kinase/response regulator, suggesting these regulatorseffect expression of the closely linked RTX transcriptional units.

Also present are genes encoding numerous potential toxins,including several haemolysins, proteases and lipases. These includehap, the haemagluttinin protease, a secreted metalloprotease thatseems to attack proteins involved in maintaining the integrity ofepithelial cell tight junctions37, and hlyA, encoding a secretedhaemolysin that displays enterotoxic activity38. In contrast toCTX, RTX and all known intestinal colonization factors, the hapand hlyA genes virulence factors reside on chromosome 2.

Vibrio cholerae has been reported to produce shiga-like toxins39;however, the sequence did not reveal genes encoding speci®chomologues of the A or B subunits of shiga toxin. Also not detectedwere genes encoding homologues of E. coli heat stable toxin (ST),which have been detected in other pathogenic strains of V.cholerae40.Colonization factors. The critical intestinal colonization factorof V. cholerae is the TCP, a type IV pilus5,41. The genomesequence con®rmed that the genes involved in TCP assembly(tcpABCDEFGHIJNQRST) reside on chromosome 1 (ref. 20) as partof a proposed `pathogenicity island' (also referred to as VPI)composed of recently acquired DNA that encodes not only TCP,but also other genes associated with the ToxR regulatory cascade,such as acfABCD, toxT, aldA and tagAB10,11. Trinucleotide composi-tion analysis suggest that this 45.3-kb segment begins at a 20-bp siteupstream of aldA, and encompasses a helicase-related protein and atranscriptional activator which both share homology with bacter-iophage proteins. At the other end of the segment of atypicaltrinucleotide composition is a phage family integrase and the

articles


V.cholerae VC0512 V.cholerae VCA1034

V.cholerae VCA0974 V.cholerae VCA0068

V.cholerae VC0825 V.cholerae VC0282

V.cholerae VCA0906 V.cholerae VCA0979

V.cholerae VCA1056 V.cholerae VC1643


V.cholerae VC0514

V.cholerae VCA0773V.cholerae VC1868

V.cholerae VC1313 V.cholerae VC1859


V.cholerae VCA0658 V.cholerae VC1405V.cholerae VC1298

V.cholerae VC1248V.cholerae VCA0864 V.cholerae VCA0176

V.cholerae VCA0220 V.cholerae VC1289

V.cholerae A1069V.cholerae VC2439

V.cholerae 1967V.cholerae A0031


V.cholerae VCA0988V.cholerae VC0216 V.cholerae VC0449

V.cholerae VCA0008V.cholerae VC1406

V.cholerae VC1535V.cholerae VC0840

V.cholerae VC0098

E.coli gi1787690

E.coli gi1789453

C.jejuni Cj0144

C.jejuni Cj0262c

B.subtilis gi2633766Synechocystis sp. gi1001299

Synechocystis sp. gi1001300Synechocystis sp. gi1652276

Synechocystis sp. gi1652103H.pylori gi2313716H.pylori 99 gi4155097

C.jejuni Cj1190cC.jejuni Cj1110c

A.fulgidus gi2649560A.fulgidus gi2649548

B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609

B.subtilis gi2635610 B.subtilis gi2635882

E.coli gi1788195E.coli gi2367378E.coli gi1788194

V.cholerae VCA1092

H.pylori gi2313186H.pylori 99 gi4154603

C.jejuni Cj1564

C.jejuni Cj1506cH.pylori gi2313163H.pylori 99 gi4154575

H.pylori gi2313179H.pylori 99 gi4154599

C.jejuni Cj0019cC.jejuni Cj0951c

C.jejuni Cj0246cB.subtilis gi2633374

T.maritima TM0014 V.cholerae VC1403

V.cholerae VCA1088T.pallidum gi3322777

T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522

T.pallidum gi3322296B.burgdorferi gi2688521

T.maritima TM0429T.maritima TM0918T.maritima TM0023

T.maritima TM1428T.maritima TM1143

T.maritima TM1146 P.abyssi PAB1308

P.horikoshii gi3256846 P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884

D.radiodurans DRA00354D.radiodurans DRA0353

D.radiodurans DRA0352V.cholerae VC1394

P.abyssi PAB1189P.horikoshii gi3258414

B.burgdorferigi 2688621M.tuberculosis gi1666149

V.cholerae VC0622

**

**

***

**

****

****

***

****

**

*

****

**

**

**

****

**

****

*

****

** ****

***

*

**

***

****

**

*

**

**

*

Figure 6 Phylogenetic tree of methyl-accepting chemotactic proteins (MCP) homologues

in completed genomes. Homologues of MCP were identi®ed by FASTA3 searches of all

available complete genomes. Amino-acid sequences of the proteins were aligned using

CLUSTALW, and a neighbour-joining phylogenetic tree was generated from the alignment

using the PAUP* program (using a PAM-based distance calculation). Hypervariable

regions of the alignment and positions with gaps in many of the sequences were excluded

from the analysis. Nodes with signi®cant bootstrap values are indicated: two asterisks,

.70%; asterisk, 40±70%.


other copy of the 20-bp site, which is presumably the target forintegration of the island onto the chromosome10,11. It has beenproposed that the TCP/ACF island corresponds to the genome of a®lamentous phage that uses TCP pilin as a coat protein4. However,other than the three genes encoding phage-related proteins (that is,the helicase, transcriptional activator and integrase) we could ®ndno other genes on the island that encoded products with signi®canthomology to the conserved gene products of other ®lamentousphages or the structural proteins of non®lamentous phages.

The maltose-sensitive haemagglutinin (MSHA) is unique to theEl Tor biotype of V. cholerae. Initially characterized as a haemag-glutinin, it was later found to be a type IV pilus42,43. The MSHAbiogenesis (MshHIJKLMNEGF) and structural (MshBACD) pro-teins are all clustered on chromosome 1. There are no apparentintegrases or transposases that might de®ne this region as apathogenicity island or suggest an origin for it other than V.cholerae. In support of this conclusion, trinucleotide compositionanalysis shows that this region has similar composition to the rest ofthe chromosome, suggesting that if these genes were acquired it wasvery early in the Vibrio phylogenetic history. Recently, severalinvestigators have reported that MSHA is not required for intestinalcolonization, nor does it seem to appreciably affect the ef®ciency ofcolonization44±46, but instead plays a role in bio®lm formation27,28.Accordingly, this pilus may be important for the environmental®tness of Vibrio species rather than for pathogenic potential.

The pilA region of V. cholerae genome apparently encodes a thirdtype IV pilus, although it has not been visualized47. This gene clusterincludes a gene encoding a prepilin peptidase (PilD) that is requiredfor the ef®cient processing of protein complexes with type IVprepilin-like signal sequences including TCP, MSHA and EPS47,48.The EPS system of V. cholerae encodes a type II secretion systeminvolved in extracellular export of CT and other proteins. The EPSsystem is encoded by chromosome 1 but, like the MSHA genes,trinucleotide analysis suggests that the EPS genes of V. cholerae havenot been recently acquired. In contrast, trinucleotide compositionanalysis suggests that the pilA gene cluster was acquired by hori-zontal transfer. Thus, analysis of the V. cholerae genome sequenceprovides some evidence that older gene clusters, like MSHA andEPS, have become dependent on newly acquired genes such as PilD.

ConclusionsThe Vibrio cholerae genome sequence provides a new starting pointfor the study of this organism's environmental and pathobiologicalcharacteristics. It will be interesting to determine the gene expres-sion patterns that are unique to its survival and replication duringhuman infection35 as well as in the environment13,14,16. Additionally,the genomic sequence of V. cholerae should facilitate the study ofthis model multi-chromosomal prokaryotic organism. Compara-tive genomics between several species in the genus Vibrio willprovide a better understanding of the origin of the new smallchromosome and the role that it plays in Vibrio biology. Thegenome sequence may also provide important clues to understand-ing the metabolic and regulatory networks that link genes on thetwo chromosomes. Finally, V. cholerae clearly represents a promis-ing genetic system for studying how several horizontally acquiredloci located on separate chromosomes can still ef®ciently interact atthe regulatory, cell biology and biochemical levels. M

MethodsWhole-genome random sequencing procedure.

Vibrio cholerae N16961 was grown from a single isolated colony. Cloning, sequencing andassembly were as described for genomes sequenced by TIGR18. One small-insert plasmidlibrary (2±3 kb) was generated by random mechanical shearing of genomic DNA. Onelarge insert library was ligated into l-DASHII/EcoRI vector (Stratagene). In the initialsequence phase, approximately sevenfold sequence coverage was achieved with 49,633sequences from plasmid clones. Sequences from both ends of 383 l-clones served as agenome scaffold, verifying the orientation, order and integrity of the contigs. The plasmidand l sequences were jointly assembled using TIGR Assembler. Sequence gaps were closed

by editing the end sequences and/or primer walking on plasmid clones. Physical gaps wereclosed by direct sequencing of genomic DNA, or combinatorial polymerase chain reaction(PCR) followed by sequencing the PCR product. The ®nal genome sequence is based on51,164 sequences.

ORF prediction and gene family identi®cation. An initial set of ORFs, likely to encodeproteins, was identi®ed with GLIMMER49, and those shorter than 30 codons wereeliminated. ORFs that overlapped were visually inspected, and in some cases removed.ORFs were searched against a non-redundant protein database18. Frameshifts and pointmutations were detected and corrected where appropriate. Remaining frameshifts andpoint mutations are considered to be authentic and were annotated as `authentic frame-shift' or `authentic point mutation'. ORFs were also analysed with two sets of hiddenMarkov models (HMMs) constructed for a number of conserved protein families (1,313from Pfam v3.1 (ref. 50) and 476 from the TIGRFAM) by use of the HMMER package.TopPred was used to identify membrane-spanning domains in proteins.

Paralogous gene families were constructed by searching the ORFs against themselvesusing BLASTX, identifying matches with E # 10-5 over 60% of the query search length,and subsequently clustering these matches into multigene families. Multiple alignmentsfor these protein families were generated with the CLUSTALW program and thealignments scrutinized.

Distribution of all 64 trinucleotides (3-mers) for each chromosome was determined,and the 3-mer distribution in 2,000-bp windows that overlapped by half their length(1,000 bp) across the genome was computed. For each window, we computed the x2

statistic on the difference between its 3-mer content and that of the whole chromosome. Alarge value of this statistic indicates that the 3-mer composition in this window is differentfrom the rest of the chromosome. Probability values for this analysis are based on theassumption that the DNA composition is relatively uniform throughout the genome.Because this assumption may be incorrect, we prefer to interpret high x2 values merely asindicators of regions on the chromosome that appear unusual and demand further scrutiny.

Homologues of the genes of interest were identi®ed using the BLASTP and FASTA3search programs. All homologues were then aligned to each other using the CLUSTALWprogram with default settings. Phylogenetic trees were generated from the alignmentsusing the neighbour-joining algorithm as implemented by the PAUP* program (with aPAM matrix based distance calculation). Regions of the alignment that were hypervariableor were of low con®dence were excluded from the phylogenetic analysis. All alignments areavailable upon request.

Received 3 April; accepted 18 May 2000.

1. Wachsmuth, K., Olsvik, é., Evins, G. M. & Popovic, T. in Vibrio Cholerae And Cholera: Molecular To

Global Perspective (eds Wachsmuth, I. K., Blake, P. A. & Olsvik, é.) 357±370 (ASM Press, Washington

DC, 1994).

2. Faruque, S. M., Albert, M. J. & Mekalanos, J. J. Epidemiology, genetics, and ecology of toxigenic Vibrio

cholerae. Microbiol. Mol. Biol. Rev. 62, 1301±1314 (1998).

3. Waldor, M. K. & Mekalanos, J. J. Lysogenic conversion by a ®lamentous phage encoding cholera toxin.

Science 272, 1910±1914 (1996).

4. Karaolis, D. K., Somara, S., Maneval, D. R. Jr, Johnson, J. A. & Kaper, J. B. A bacteriophage encoding a

pathogenicity island, a type-IV pilus and a phage receptor in cholera bacteria. Nature 399, 375±379

(1999).

5. Brown, R. C. & Taylor, R. K. Organization of tcp, acf, and toxT genes within a ToxT-dependent

operon. Mol. Microbiol. 16, 425±439 (1995).

6. Hochhut, B. & Waldor, M. K. Site-speci®c integration of the conjugal Vibrio cholerae SXTelement into

prfC. Mol. Microbiol. 32, 99±110 (1999).

7. Yildiz, F. H. & Schoolnik, G. K. Vibrio cholerae O1 El Tor: identi®cation of a gene cluster required for

the rugose colony type, exopolysaccharide production, chlorine resistance, and bio®lm formation.

Proc. Natl Acad. Sci. USA 96, 4028±4033 (1999).

8. Bik, E. M., Bunschoten, A. E., Gouw, R. D. & Mooi, F. R. Genesis of the novel epidemic Vibrio cholerae

O139 strain: evidence for horizontal transfer of genes involved in polysaccharide synthesis. EMBO J

14, 209±216 (1995).

9. Waldor, M. K., Colwell, R. & Mekalanos, J. J. The Vibrio cholerae O139 serogroup antigen includes an

O-antigen capsule and lipopolysaccharide virulence determinants. Proc. Natl Acad. Sci. USA 91,

11388±11392 (1994).

10. Kovach, M. E., Shaffer, M. D. & Peterson, K. M. A putative integrase gene de®nes the distal end of a

large cluster of ToxR-regulated colonization genes in Vibrio cholerae. Microbiology 142, 2165±2174

(1996).

11. Karaolis, D. K. et al. A Vibrio cholerae pathogenicity island associated with epidemic and pandemic

strains. Proc. Natl Acad. Sci. USA 95, 3134±3139 (1998).

12. Mekalanos, J. J., Rubin, E. J. & Waldor, M. K. Cholera: molecular basis for emergence and

pathogenesis. FEMS Immunol. Med. Microbiol. 18, 241±248 (1997).

13. Colwell, R. R. Global climate and infectious disease: the cholera paradigm. Science 274, 2025±2031

(1996).

14. Colwell, R. R. & Spira, W. M. in Cholera (eds Barua, D. & Greenough, W. B. III) 107±127 (Plenum

Medical, New York, 1992).

15. Lobitz, B. et al. Climate and infectious disease: use of remote sensing for detection of Vibrio cholerae by

indirect measurement. Proc. Natl Acad. Sci. USA 97, 1438±1443 (2000).

16. Colwell, R. R. & Huq, A. Environmental reservoir of Vibrio cholerae. The causative agent of cholera.

Ann. NY Acad. Sci. 740, 44±54 (1994).

17. Roszak, D. B. & Colwell, R. R. Survival strategies of bacteria in the natural environment. Microbiol.

Rev. 51, 365±379 (1987).

18. Fraser, C. M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390,

580±586 (1997).

19. Yamaichi, Y., Iida, T., Park, K. S., Yamamoto, K. & Honda, T. Physical and genetic map of the genome

of Vibrio parahaemolyticus: presence of two chromosomes in Vibrio species. Mol. Microbiol. 31, 1513±

1521 (1999).

20. Trucksis, M., Michalski, J., Deng, Y. K. & Kaper, J. B. The Vibrio cholerae genome contains two unique

circular chromosomes. Proc. Natl Acad. Sci. USA 95, 14464±14469 (1998).

articles

482 NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com© 2000 Macmillan Magazines Ltd

21. Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13,

660±665 (1996).

22. Rowe-Magnus, D. A., Guerout, A. M. & Mazel, D. Super-integrons. Res. Microbiol. 150, 641±651

(1999).

23. Hall, R. M., Brookes, D. E. & Stokes, H. W. Site-speci®c insertion of genes into integrons: role of the

59-base element and determination of the recombination cross-over point. Mol. Microbiol. 5, 1941±

1959 (1991).

24. Davis, B. M., Kimsey, H. H., Chang, W. & Waldor, M. K. The Vibrio cholerae O139 Calcutta

bacteriophage CTXphi is infectious and encodes a novel repressor. J. Bacteriol. 181, 6779±6787

(1999).

25. Mekalanos, J. J. Duplication and ampli®cation of toxin genes in Vibrio cholerae. Cell 35, 253±263

(1983).

26. Mazel, D., Dychinco, B., Webb, V. A. & Davies, J. A distinctive class of integron in the Vibrio cholerae

genome. Science 280, 605±608 (1998).

27. Watnick, P. I. & Kolter, R. Steps in the development of a Vibrio cholerae El Tor bio®lm. Mol. Microbiol.

34, 586±595 (1999).

28. Watnick, P. I., Fullner, K. J. & Kolter, R. A role for the mannose-sensitive hemagglutinin in bio®lm

formation by Vibrio cholerae El Tor. J. Bacteriol. 181, 3606±3609 (1999).

29. Huq, A. et al. Ecological relationships between Vibrio cholerae and planktonic crustacean copepods.

Appl. Environ. Microbiol. 45, 275±283 (1983).

30. Bassler, B. L., Yu, C., Lee, Y. C. & Roseman, S. Chitin utilization by marine bacteria. Degradation

and catabolism of chitin oligosaccharides by Vibrio furnissii. J. Biol. Chem. 266, 24276±24286

(1991).

31. Yildiz, F. H. & Schoolnik, G. K. Role of rpoS in stress survival and virulence of Vibrio cholerae. J.

Bacteriol. 180, 773±784 (1998).

32. Bassler, B. L., Greenberg, E. P. & Stevens, A. M. Cross-species induction of luminescence in the

quorum-sensing bacterium Vibrio harveyi. J. Bacteriol. 179, 4043±4045 (1997).

33. Williams, S. G., Attridge, S. R. & Manning, P. A. The transcriptional activator HlyU of Vibrio cholerae:

nucleotide sequence and role in virulence gene expression. Mol. Microbiol. 9, 751±760 (1993).

34. Eisen, J. A. & Hanawalt, P. C. A phylogenomic study of DNA repair genes, proteins, and processes.

Mutat. Res. 435, 171±213 (1999).

35. Lee, S. H., Hava, D. L., Waldor, M. K. & Camilli, A. Regulation and temporal expression patterns of

Vibrio cholerae virulence genes during infection. Cell 99, 625±634 (1999).

36. Lin, W. et al. Identi®cation of a Vibrio cholerae RTX toxin gene cluster that is tightly linked to the

cholera toxin prophage. Proc. Natl Acad. Sci. USA 96, 1071±1076 (1999).

37. Wu, Z., Milton, D., Nybom, P., Sjo, A. & Magnusson, K. E. Vibrio cholerae hemagglutinin/protease

(HA/protease) causes morphological changes in cultured epithelial cells and perturbs their para-

cellular barrier function. Microb. Pathog. 21, 111±123 (1996).

38. Alm, R. A., Stroeher, U. H. & Manning, P. A. Extracellular proteins of Vibrio cholerae: nucleotide

sequence of the structural gene (hlyA) for the haemolysin of the haemolytic El Tor strain 017 and

characterization of the hlyA mutation in the non- haemolytic classical strain 569B. Mol. Microbiol. 2,

481±488 (1988).

39. O'Brien, A. D., Chen, M. E., Holmes, R. K., Kaper, J. & Levine, M. M. Environmental and human

isolates of Vibrio cholerae and Vibrio parahaemolyticus produce a Shigella dysenteriae 1 (Shiga)-like

cytotoxin. Lancet 1, 77±78 (1984).

40. Ogawa, A., Kato, J., Watanabe, H., Nair, B. G. & Takeda, T. Cloning and nucleotide sequence of a heat-

stable enterotoxin gene from Vibrio cholerae non-O1 isolated from a patient with traveler's diarrhea.

Infect. Immun. 58, 3325±3329 (1990).

41. Manning, P. A. The tcp gene cluster of Vibrio cholerae. Gene 192, 63±70 (1997).

42. Jonson, G., Holmgren, J. & Svennerholm, A. M. Identi®cation of a mannose-binding pilus on Vibrio

cholerae El Tor. Microb. Pathog. 11, 433±441 (1991).

43. Jonson, G., Lebens, M. & Holmgren, J. Cloning and sequencing of Vibrio cholerae mannose-sensitive

haemagglutinin pilin gene: localization of mshA within a cluster of type 4 pilin genes. Mol. Microbiol.

13, 109±118 (1994).

44. Attridge, S. R., Manning, P. A., Holmgren, J. & Jonson, G. Relative signi®cance of mannose-sensitive

hemagglutinin and toxin-coregulated pili in colonization of infant mice by Vibrio cholerae El Tor.

Infect. Immun. 64, 3369±3373 (1996).

45. Tacket, C. O. et al. Investigation of the roles of toxin-coregulated pili and mannose- sensitive

hemagglutinin pili in the pathogenesis of Vibrio cholerae O139 infection. Infect. Immun. 66, 692±695

(1998).

46. Thelin, K. H. & Taylor, R. K. Toxin-coregulated pilus, but not mannose-sensitive hemagglutinin, is

required for colonization by Vibrio cholerae O1 El Tor biotype and O139 strains. Infect. Immun. 64,

2853±2856 (1996).

47. Fullner, K. J. & Mekalanos, J. J. Genetic characterization of a new type IV-A pilus gene cluster found in

both classical and El Tor biotypes of Vibrio cholerae. Infect. Immun. 67, 1393±1404 (1999).

48. Marsh, J. W. & Taylor, R. K. Identi®cation of the Vibrio cholerae type 4 prepilin peptidase required for

cholera toxin secretion and pilus formation. Mol. Microbiol. 29, 1481±1492 (1998).

49. Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identi®cation using interpolated

Markov models. Nucleic Acids Res. 26, 544±548 (1998).

50. Bateman, A. et al. Pfam 3.1: 1313 multiple alignments and pro®le HMMs match the majority of

proteins. Nucleic Acids Res. 27, 260±262 (1999).

Supplementary information is available on Nature's World-Wide Web site (http://www.nature.com) or as paper copy from the London editorial of®ce of Nature.

Acknowledgements

This work was supported by the National Institutes of Health, National Institute of Allergyand Infectious Disease. We thank M. Heaney, V. Sapiro, B. Lee, M. Holmes and B. Vincentfor database and software support.

Correspondence and requests for materials should be addressed to C.M.F.(e-mail: [email protected]). The annotated genome sequence and the gene family alignmentsare available at hhttp://www.tigr.org/tbd/mdbi. The sequences have been deposited inGenBank with accession number AE003852 (chromosome 1) and AE003853 (chromo-some 2).

articles

NATURE | VOL 406 | 3 AUGUST 2000 | www.nature.com 483© 2000 Macmillan Magazines Ltd

articles


56

67

1112

910

611

6

58

146

66

95

1911

95

1313

914

913

1212

55

1912

5

57

55

98

913

68

117

69

119

117

69

1113

125

86

56

69

99

126

712

125

912

119

10

59

75

139

1011

56

66

88

86

59

512

1211

87

115

610

910

59

1213

1211

65

1110

13

712

77

88

614

1210

66

67

125

59

612

56

89

13

145

67

115

1110

1011

1011

11

117

59

65

75

58

12

1210

1010

89

6

126

510

512

511

712

911

1213

1013

86

116

156

77

88

65

77

712

810

712

126

58

1110

109

99

66

910

611

66

105

125

810

613

118

512

59

1211

611

611

125

2

5

48

54

18

2

1

4

4

2

1

1

2

161

12

1

14

3

16S

b

16S

d

16S

f

16S

g16

Sh

23S

a

23S

b

23S

c

23S

d

23S

e

23S

f

23S

g23

Sh

5Sb

5Sc

5Sd

5Sg

5Sh

RN

aseP

RN

A

SR

PR

NA

tmR

NA

59

913

68

811

75

1011

1111

99

128

610

177

1222

109

911

1212

59

910

12

115

1010

126

67

511

8

59

66

1110

1210

1210

1314

128

916

58

98

512

1210

56

911

5

59

86

139

65

96

8

85

1014

135

55

95

136

9

812

136

56

1

2

1

Mem

bran

e pr

otei

n

Am

ino

acid

bio

synt

hesi

s

Bio

synt

hesi

s of

cof

acto

rs, p

rost

hetic

gro

ups,

and

car

riers

Cel

l env

elop

e

Cel

lula

r pr

oces

ses

Cen

tral

inte

rmed

iary

met

abol

ism

DN

A m

etab

olis

m

Ene

rgy

met

abol

ism

Fatty

aci

d an

d ph

osph

olip

id m

etab

olis

m

Reg

ulat

ory

func

tions

Tran

scrip

tion

Tran

spor

t and

bin

ding

pro

tein

s

Nuc

leot

ides

Pro

tein

fate

/pro

tein

syn

thes

is

Con

serv

ed h

ypot

hetic

al

No

data

base

mat

ch

21

16S

c2

1

1

16S

a1

1

16S

e1

1

12

34

56

78

910

1213

1415

1618

1920

2122

2324

2627

2829

3031

3233

3435

3637

3940

4142

4344

4546

4748

4950

5152

5455

5657

5860

6162

6364

6566

6768

6970

7172

7374

7576

7879

8081

8283

8485

8687

8889

9091

9293

9495

9798

9910

0

103

105

106

108

111

112

113

114

115

116

117

118

119

120

122

125

126

127

128

129

130

131

132

134

135

136

137

139

140

142

144

146

147

148

149

150

151

152

153

154

156

157

158

159

161

162

164

165

166

167

168

170

171

172

173

174

175

177

178

179

180

181

182

183

184

185

186

187

188

190

191

192

194

194

195

196

198

199

200

201

202

203

204

206

207

208

209

210

211

212

213

214

215

216

217

220

221

222

223

224

225

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

249

250

251

252

254

256

257

258

259

260

262

263

264

265

266

268

267

269

270

271

274

275

276

277

278

280

281

282

283

284

285

286

287

288

289

291

293

295

296

297

298

299

300

302

303

304

305

306

307

308

309

312

315

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

348

349

350

353

354

355

356

357

359

360

361

362

365

366

369

370

371

372

373

374

376

377

378

379

381

384

385

385

386

389

390

391

392

393

394

395

396

397

398

399

400

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

428

429

430

431

432

433

434

437

438

439

440

441

442

443

444

445

446

447

449

450

452

453

454

455

456

457

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

480

481

482

483

485

486

487

488

489

490

491

492

493

494

495

496

498

499

500

501

502

503

505

506

508

509

510

512

513

514

515

516

517

518

519

521

522

523

524

525

526

528

529

530

531

532

533

534

535

537

538

539

540

541

542

543

544

545

547

549

550

551

552

553

554

556

557

558

559

560

562

563

564

565

566

566

567

568

570

571

573

574

575

576

577

578

579

580

582

581

583

585

586

587

589

590

591

592

593

594

595

596

597

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

626

627

628

629

630

631

632

633

634

636

637

638

639

640

641

642

643

644

645

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

667

668

671

672

673

674

675

676

677

678

680

681

682

683

684

685

687

688

689

690

691

692

693

694

695

696

697

698

700

701

702

703

704

705

706

708

709

710

711

714

715

717

718

719

720

721

722

723

724

725

726

727

728

730

731

732

734

736

737

739

741

742

743

744

745

746

747

748

749

750

751

752

753

755

756

757

758

759

760

761

762

763

764

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

783

784

786

787

788

790

791

792

793

794

795

796

798

799

800

801

802

803

804

806

807

809

810

811

812

813

814

815

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

851

852

853

854

855

856

857

858

859

860

861

862

863

864

866

869

870

873

875

876

880

881

884

886

888

889

890

892

893

894

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

913

914

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

933

934

934

935

936

937

938

939

940

941

942

943

944

947

948

949

950

951

953

954

956

957

958

959

960

961

962

963

964

965

968

969

970

971

972

973

974

975

976

977

979

980

981

983

984

985

986

987

988

990

991

992

993

994

995

997

998

999

1000

1001

1002

1003

1004

1005

1006

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1021

1023

1024

1025

1026

1028

1029

1031

1033

1034

1035

1037

1036

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1053

1054

1056

1058

1059

1060

1061

1062

1063

1064

1066

1067

1069

1070

1071

1073

1074

1075

1076

1077

1079

1081

1083

1084

1085

1086

1087

1088

1089

1091

1092

1093

1094

1095

1096

1097

1098

1099

1101

1102

1103

1104

1105

1106

1107

1108

1110

1111

1112

1113

1114

1115

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1144

1145

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1188

1190

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1209

1210

1211

1212

1213

1214

1215

1216

1217

1219

1220

1224

1226

1228

1229

1231

1232

1234

1235

1236

1237

1238

1239

1240

1242

1244

1245

1246

1248

1249

1250

1252

1253

1255

1256

1257

1258

1259

1260

1261

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1282

1284

1285

1286

1287

1288

1289

1290

1291

1293

1295

1296

1297

1298

1300

1301

1302

1303

1304

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1317

1318

1319

1320

1321

1323

1325

1327

1328

1329

1330

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1349

1348

1350

1353

1354

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1369

1370

1371

1372

1373

1374

1375

1376

1377

1379

1382

1384

1386

1388

1390

1391

1392

139

413

9713

9914

0014

0114

0214

0314

0514

06

1407

1408

1409

1410

1411

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

1427

1428

1429

1430

1431

1432

1433

1434

1435

1437

1438

1439

1441

1442

1444

1445

1446

1447

1448

1450

1451

1454

1457

1458

1460

1463

1465

1467

1469

1470

1473

1475

1476

1478

1482

1481

1483

1485

1486

1488

1490

1491

1492

1492

1494

1495

1496

1497

1499

1498

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1511

1512

1513

1515

1516

1517

1518

1519

1520

1521

1522

1523

1524

1525

1526

1527

1529

1532

1534

1533

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1557

1558

1560

1561

1562

1563

1565

1566

1567

1568

1570

1571

1573

1575

1577

1579

1580

1581

1582

1583

1584

1585

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

1611

1612

1614

1615

1617

1618

1619

1620

1621

1622

1623

1624

1625

1627

1628

1629

1630

1631

1632

1633

1634

1636

1635

1638

1639

1641

1643

1644

1645

1647

1649

1650

1651

1652

1653

1655

1656

1658

1659

1660

1663

1664

1665

1666

1667

1669

1670

1671

1672

1673

1674

1675

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1692

1693

1695

1697

1698

1700

1701

1703

1704

1706

1708

1709

1710

1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1725

1726

1727

1730

1732

1735

1736

1738

1740

1741

1743

1745

1746

1748

1749

1750

1751

1753

1754

1755

1756

1757

1758

1759

1760

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

1772

1773

1774

1775

1776

1777

1778

1779

1781

1782

1783

1784

1786

1788

1789

1791

1794

1797

1798

1799

1800

1803

1805

1806

1807

1808

1811

1814

1815

1817

1819

1820

1821

1822

1824

1825

1826

1827

1828

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

1857

1859

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1877

1878

1880

1879

1882

1883

1884

1885

1886

1887

1888

1889

1890

1892

1893

1894

1896

1898

1899

1900

1901

1902

1903

1904

1905

1906

1907

1909

1910

1911

1912

1915

1916

1918

1920

1921

1922

1923

1925

1926

1927

1928

1929

1931

1933

1934

1935

1936

1937

1938

1939

1940

1941

1942

1944

1945

1947

1949

1950

1951

1952

1953

1955

1956

1959

1960

1962

1963

1964

1965

1966

1967

1968

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1981

1980

1983

1984

1985

1986

1987

1989

1991

1990

1992

1993

1994

1995

1997

1998

2000

2001

2002

2003

2004

2006

2007

2008

2009

2011

2012

2013

2014

2015

2016

2017

2018

2019

2021

2022

2023

2024

2026

2027

2028

2030

2031

2032

2033

2035

2036

2037

2038

2039

2041

2042

2043

2045

2046

2047

2048

2049

2051

2052

2053

2055

2056

2057

2058

2059

2060

2061

2062

2063

2064

2066

2067

2068

2069

2070

2072

2073

2074

2075

2077

2080

2081

2082

2083

2084

2085

2086

2087

2088

2089

2092

2093

2094

2095

2096

2097

2099

2100

2101

2103

2106

2107

2108

2109

2110

2111

2113

2115

2116

2118

2119

2120

2121

2123

2126

2127

2128

2129

2130

2131

2132

2133

2135

2136

2137

2140

2141

2142

2143

2144

2145

2146

2152

2153

2156

2157

2159

2160

2161

2162

2164

2166

2167

2168

2171

2172

2174

2175

2176

2178

2179

2180

2181

2182

2183

2184

2185

2187

2188

2190

2191

2192

2193

2194

2195

2196

2197

2198

2201

2202

2203

2205

2206

2207

2208

2209

2210

2211

2212

2213

2214

2215

2216

2217

2220

2222

2223

2224

2225

2226

2227

2229

2230

2231

2232

2233

2234

2235

2236

2237

2238

2240

2242

2244

2245

2245

2246

2247

2248

2249

2250

2251

2252

2253

2254

2255

2256

2257

2258

2259

2260

2261

2262

2265

2266

2267

2268

2269

2270

2271

2272

2273

2274

2276

2277

2278

2279

2280

2281

2282

2283

2285

2286

2287

2289

2290

2291

2292

2293

2294

2295

2297

2298

2299

2300

2301

2302

2303

2305

2307

2308

2309

2310

2311

2312

2316

2319

2320

2322

2323

2324

2329

2330

2332

2333

2334

2335

2337

2338

2339

2340

2341

2342

2343

2344

2344

2345

2346

2347

2348

2349

2350

2352

2353

2355

2356

2358

2359

2360

2362

2363

2364

2365

2366

2368

2369

2370

2371

2373

2374

2376

2377

2378

2379

2380

2381

2382

2383

2384

2385

2386

2387

2388

2389

2390

2391

2394

2395

2396

2397

2398

2399

2400

2401

2402

2403

2404

2405

2406

2407

2409

2412

2413

2414

2415

2416

2417

2418

2419

2420

2421

2422

2423

2424

2425

2425

2426

2427

2428

2430

2431

2432

2433

2434

2435

2436

2437

2438

2439

2440

2441

2442

2443

2444

2445

2446

2447

2448

2450

2451

2452

2453

2454

2456

2458

2459

246

024

6124

6224

6324

6424

6524

6624

6724

6924

7024

7224

7424

7524

7624

7924

8024

8124

8224

8324

8424

8524

8724

8824

8924

9024

9124

9224

9324

9424

9724

9824

9925

0025

0125

0225

0325

0425

0525

0625

0725

0825

10

2510

2511

2513

2514

2517

2518

2519

2520

2522

2523

2524

2525

2527

2528

2529

2531

2532

2534

2535

2536

2537

2538

2539

2541

2542

2544

2545

2547

2548

2549

2550

255

225

5325

5425

5525

5725

5825

5925

6025

6125

6225

6325

6425

6525

6625

6725

6825

6925

7125

7225

7625

7725

7925

8125

8425

9025

9325

9525

9625

9825

9926

0026

0126

0226

0326

0426

0626

0726

0826

0926

1026

1326

1426

1526

1626

1726

1826

1926

2026

2126

2226

2326

2426

2526

2626

2726

28

2628

2629

2630

2631

2632

2633

2634

2635

2636

2637

2638

2641

2642

2643

2644

2645

2646

2647

2649

2651

2653

2654

2655

2656

2657

2660

2661

2662

2664

2666

2668

267

026

7126

7226

7326

7426

7526

7626

7726

7826

8126

8326

8426

8526

8826

8926

9026

9126

9226

9326

9426

9526

9626

9726

9826

9927

0027

0127

0227

0327

0527

0627

0827

1027

1127

1227

1327

1427

1527

1627

1727

1927

1827

2027

2127

2227

2327

2427

2527

26

2727

2729

2730

2731

2732

2733

2734

2736

2738

2739

2740

2741

2742

2743

2744

2746

2747

2748

2749

2750

2751

2755

2756

2757

2758

2759

2760

2761

2762

2764

2765

2766

2767

2768

2770

2772

2773

2774

2775

12

34

56

78

910

1113

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

3637

3839

4041

4243

4445

4748

4950

5152

5355

5657

5859

6163

6465

6667

6870

7172

7374

7576

7778

7980

8283

8485

8688

90

9192

9495

9697

9899

100

101

102

103

104

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

123

124

127

128

129

130

131

132

133

134

135

136

137

139

140

141

142

144

146

147

148

150

151

152

154

155

156

157

159

160

161

164

165

167

168

169

170

171

172

173

174

175

176

178

179

180

181

181

182

183

186

189

190

191

192

193

194

195

196

197

198

199

200

201

202

204

205

207

208

210

211

212

213

214

217

218

219

220

221

222

223

225

227

228

229

230

231

232

233

235

237

238

239

240

241

242

243

244

245

246

247

248

249

250

253

254

255

256

257

258

261

262

263

264

265

266

267

268

269

270

271

272

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

290

291

293

294

299

300

301

303

307

308

309

310

311

313

316

317

318

322

325

328

330

331

334

337

338

340

341

344

345

346

347

349

350

351

354

355

356

358

360

361

363

365

366

367

368

36937

037

137

237

437

537

637

938

038

238

538

738

839

139

539

639

739

940

040

140

240

540

640

840

941

041

341

441

541

641

741

942

042

142

342

442

542

843

143

243

343

643

944

044

144

244

3

447

448

449

450

451

453

454

455

457

458

459

460

463

464

465

468

470

472

474

475

476

479

481

483

485

486

487

490

491

493

495

496

498

501

504

505

506

507

508

510

511

512

513

514

516

517

518

519

520

521

522

523

524

526

527

528

529

530

531

532

534

535

536

537

538

539

540

542

543

544

545

546

549

550

552

554

556

557

558

559

560

561

563

564

565

566

567

568

571

572

573

574

575

576

578

580

581

582

583

584

585

586

587

588

589

590

591

592

593

595

596

599

600

601

602

603

604

605

606

607

608

610

612

614

615

616

617

618

619

620

621

623

624

625

627

628

629

630

631

632

633

634

635

637

638

639

640

641

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

661

662

663

663

665

666

667

669

671

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

697

699

700

702

703

704

705

706

707

708

710

709

711

712

714

716

717

718

719

720

723

724

725

726

727

728

729

730

732

734

735

736

737

738

740

741

744

745

747

748

749

751

752

753

754

756

757

758

759

760

762

763

764

765

766

767

768

769

772

773

774

776

777

778

779

780

781

783

782

784

785

786

788

789

790

791

792

793

795

798

799

800

801

802

803

804

805

806

807

808

809

811

812

813

814

815

817

818

820

822

823

824

825

827

828

829

830

832

833

834

835

836

837

838

840

843

846

847

848

849

850

851

852

853

854

855

856

859

860

862

863

864

865

867

870

871

872

873

875

876

877

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

917

920

921

922

923

924

925

926

927

928

929

930

931

935

936

938

937

939

940

941

943

944

945

946

947

949

951

952

954

955

956

957

958

960

961

962

963

964

965

969

971

972

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

996

997

998

999

1000

1001

1002

1004

1005

1006

1008

1010

1011

1012

1013

1015

1017

1018

1019

1020

1023

1025

1026

1027

1028

1029

1031

1033

1034

1036

1037

1038

1039

1040

1041

1043

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1062

1063

1067

1068

1069

1071

1072

1073

1074

1075

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1088

1089

1090

1091

1092

1093

1094

1095

1098

1099

1100

1101

1102

1104

1105

1108

1109

1110

1111

1112

1113

1114

1115

16s

rRN

A

23s

rRN

A

5s r

RN

A

3

Rep

eat r

egio

n

Thr

ee tR

NA

s

Rho

-ind.

term

inat

orO

ther

cat

egor

ies

1 K

b

Oth

er R

NA


articles DNA sequence of both chromosomes of the cholera...

Documents

Transcript of articles DNA sequence of both chromosomes of the cholera...