Center f Gene Regulation in Microbial Genomes€¦ · Comparative Microbial Genomics group Center f...
-
Upload
vuongkhuong -
Category
Documents
-
view
220 -
download
0
Transcript of Center f Gene Regulation in Microbial Genomes€¦ · Comparative Microbial Genomics group Center f...
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Gene Regulation in Microbial Genomes
DNA curvature, melting, and flexibility
Dave UsseryComparative Microbial Genomics & Taxonomy WorkshopPetrópolis - Rio de Janeiro17 August, 2006
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Outline
• Introduction & disclaimer
• DNA curvature
• DNA Flexibility
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
How to become a uropathogen: Comparative genomicanalysis of extraintestinal pathogenic Escherichiacoli strainsElzbieta Brzuszkiewicz*†, Holger Bruggemann*‡, Heiko Liesegang*, Melanie Emmerth†, Tobias Olschlager†,Gabor Nagy§, Kaj Albermann¶, Christian Wagner¶, Carmen Buchrieser‡, Levente Emody§, Gerhard Gottschalk*,Jorg Hacker†, and Ulrich Dobrindt†!
*Gottingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-Universitat Gottingen, Grisebachstrasse 8, 37077 Gottingen,Germany; †Institute for Molecular Biology of Infectious Diseases, Bayerische Julius-Maximilians-Universitat Wurzburg, Rontgenring 11, 97070 Wurzburg,Germany; §Institute of Medical Microbiology and Immunology, University of Pecs, Szigeti ut 12, 7624 Pecs, Hungary; ¶Biomax Informatics AG,Lochhamerstrasse 9, 82152 Martinsried, Germany; and ‡Laboratoire de Genomique des Microorganismes Pathogenes, Institut Pasteur,75724 Paris Cedex 15, France
Edited by Roy Curtiss, Arizona State University, Tempe, AZ, and approved July 2, 2006 (received for review April 18, 2006)
Uropathogenic Escherichia coli (UPEC) strain 536 (O6:K15:H31) isone of the model organisms of extraintestinal pathogenic E. coli(ExPEC). To analyze this strain’s genetic basis of urovirulence, wesequenced the entire genome and compared the data with thegenome sequence of UPEC strain CFT073 (O6:K2:H1) and to theavailable genomes of nonpathogenic E. coli strain MG1655 (K-12)and enterohemorrhagic E. coli. The genome of strain 536 is !292 kbsmaller than that of strain CFT073. Genomic differences betweenboth UPEC are mainly restricted to large pathogenicity islands,parts of which are unique to strain 536 or CFT073. Genomecomparison underlines that repeated insertions and deletions incertain parts of the genome contribute to genome evolution.Furthermore, 427 and 432 genes are only present in strain 536 orin both UPEC, respectively. The majority of the latter genes isencoded within smaller horizontally acquired DNA regions scat-tered all over the genome. Several of these genes are involved inincreasing the pathogens’ fitness and adaptability. Analysis ofvirulence-associated traits expressed in the two UPEC O6 strains,together with genome comparison, demonstrate the marked ge-netic and phenotypic variability among UPEC. The ability to accu-mulate and express a variety of virulence-associated genes distin-guishes ExPEC from many commensals and forms the basis for theindividual virulence potential of ExPEC. Accordingly, instead of acommon virulence mechanism, different ways exist among ExPECto cause disease.
fitness " genome comparison " uropathogenic Escherichia coli
Uropathogenic Escherichia coli (UPEC) are the most commoncause of community-acquired urinary tract infection (UTI)
and are responsible for 70–90% of the estimated 150 million UTIsdiagnosed annually (1). UPEC also cause !40% of all nosocomialUTI, thus representing one of the most frequently isolated noso-comial pathogens (2). These frequencies illustrate the magnitude ofthe problem but do not reflect disease diversity in the urinary tract.UTI may be acute, symptomatic with a varying severity andlocalization, but may also be sporadic, recurrent, or chronic. It isessential to understand the molecular basis of disease diversity onthe bacterial side that determines the different disease types. UPECare a geno- and phenotypically heterogeneous group of isolatesrestricted to a small number of O-serogroups that seem to representdifferent subclasses of facultative pathogens (3–5).
UPEC virulence factors are frequently encoded on pathogenicityislands (PAIs) (6–9). The two O6 strains 536 (pyelonephritisisolate) and CFT073 (urosepsis isolate) became generally acceptedUPEC model organisms, and several PAIs of them have beendescribed in detail (10–16). The complete CFT073 genome se-quence shows a mosaic structure in terms of the distribution ofbackbone genes conserved in E. coli, and ‘‘foreign’’ genes, which
presumably have been acquired horizontally (17). Genome com-parison of CFT073, O157:H7 strain EDL933, and K-12 strainMG1655 revealed that only 39.2% of their combined set of proteinsare common to all three strains (17–20), underlining the astonishingdiversity among E. coli. Furthermore, the genome sequence ofCFT073 revealed 1,623 strain-specific genes (21.2%). Comparisonof both UPEC phenotypes and their genomes with other completeE. coli genome sequences should therefore help to identify sets of‘‘UPEC-specific’’ and strain-specific proteins, respectively, that mayform the basis of their different individual phenotypes and uro-pathogenic potential.
Results and DiscussionE. coli 536 Genome Sequence Determination and Comparative Anal-ysis. The genome consists of a single circular chromosome of4,938,875 bp. No plasmids were found. The 536 genome is 292 kbsmaller than that of strain CFT073. Essentially, the additional DNAin CFT073 harbors genes of five cryptic prophages, which are absentfrom strain 536, as well as genes that are located in islands absentfrom other E. coli. The E. coli 536 genome contains one crypticprophage region.
For the 536 genome, 4,747 putative coding sequences werepredicted, !3,650 of which (77%) have highly similar orthologs inMG1655 (Fig. 1). From the remaining ORFs, 524 are also presentin CFT073, which means that 89% of all ORFs of E. coli 536 havehighly similar orthologs in the UPEC CFT073 genome (Table1 andTable 4, which is published as supporting information on the PNASweb site). Further comparison with the genome sequences ofenterohemorrhagic E. coli (EHEC) O157:H7 strains Sakai andEDL933 (18, 19) revealed !3,560 ORFs (75%) with highly similarorthologs in all published complete E. coli genomes. Of theremaining ORFs present in the genomes of strain 536 and at leastone of these other four E. coli strains, 427 are mainly located withina region of the cryptic prophage or within the major PAIs of strain536 (Table 5, which is published as supporting information on thePNAS web site).
Key features of the completely sequenced E. coli genomes are
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: EHEC, enterohemorrhagic E. coli; ExPEC, extraintestinal pathogenic E. coli;IPEC, intestinal pathogenic E. coli; PAI, pathogenicity islands; PTS, phosphotransferasesystem; UPEC, uropathogenic E. coli; UTI, urinary tract infection.
Data deposition: The complete genome sequence of E. coli strain 536 has been depositedin the GenBank database (accession no. CP000247).!To whom correspondence should be addressed at: Institut fur Molekulare Infektionsbi-ologie, Rontgenring 11, D-97070 Wurzburg, Germany. E-mail: [email protected].
© 2006 by The National Academy of Sciences of the USA
www.pnas.org#cgi#doi#10.1073#pnas.0603038103 PNAS " August 22, 2006 " vol. 103 " no. 34 " 12879–12884
MIC
ROBI
OLO
GY
How to become a uropathogen: Comparative genomicanalysis of extraintestinal pathogenic Escherichiacoli strainsElzbieta Brzuszkiewicz*†, Holger Bruggemann*‡, Heiko Liesegang*, Melanie Emmerth†, Tobias Olschlager†,Gabor Nagy§, Kaj Albermann¶, Christian Wagner¶, Carmen Buchrieser‡, Levente Emody§, Gerhard Gottschalk*,Jorg Hacker†, and Ulrich Dobrindt†!
*Gottingen Genomics Laboratory, Institute of Microbiology and Genetics, Georg-August-Universitat Gottingen, Grisebachstrasse 8, 37077 Gottingen,Germany; †Institute for Molecular Biology of Infectious Diseases, Bayerische Julius-Maximilians-Universitat Wurzburg, Rontgenring 11, 97070 Wurzburg,Germany; §Institute of Medical Microbiology and Immunology, University of Pecs, Szigeti ut 12, 7624 Pecs, Hungary; ¶Biomax Informatics AG,Lochhamerstrasse 9, 82152 Martinsried, Germany; and ‡Laboratoire de Genomique des Microorganismes Pathogenes, Institut Pasteur,75724 Paris Cedex 15, France
Edited by Roy Curtiss, Arizona State University, Tempe, AZ, and approved July 2, 2006 (received for review April 18, 2006)
Uropathogenic Escherichia coli (UPEC) strain 536 (O6:K15:H31) isone of the model organisms of extraintestinal pathogenic E. coli(ExPEC). To analyze this strain’s genetic basis of urovirulence, wesequenced the entire genome and compared the data with thegenome sequence of UPEC strain CFT073 (O6:K2:H1) and to theavailable genomes of nonpathogenic E. coli strain MG1655 (K-12)and enterohemorrhagic E. coli. The genome of strain 536 is !292 kbsmaller than that of strain CFT073. Genomic differences betweenboth UPEC are mainly restricted to large pathogenicity islands,parts of which are unique to strain 536 or CFT073. Genomecomparison underlines that repeated insertions and deletions incertain parts of the genome contribute to genome evolution.Furthermore, 427 and 432 genes are only present in strain 536 orin both UPEC, respectively. The majority of the latter genes isencoded within smaller horizontally acquired DNA regions scat-tered all over the genome. Several of these genes are involved inincreasing the pathogens’ fitness and adaptability. Analysis ofvirulence-associated traits expressed in the two UPEC O6 strains,together with genome comparison, demonstrate the marked ge-netic and phenotypic variability among UPEC. The ability to accu-mulate and express a variety of virulence-associated genes distin-guishes ExPEC from many commensals and forms the basis for theindividual virulence potential of ExPEC. Accordingly, instead of acommon virulence mechanism, different ways exist among ExPECto cause disease.
fitness " genome comparison " uropathogenic Escherichia coli
Uropathogenic Escherichia coli (UPEC) are the most commoncause of community-acquired urinary tract infection (UTI)
and are responsible for 70–90% of the estimated 150 million UTIsdiagnosed annually (1). UPEC also cause !40% of all nosocomialUTI, thus representing one of the most frequently isolated noso-comial pathogens (2). These frequencies illustrate the magnitude ofthe problem but do not reflect disease diversity in the urinary tract.UTI may be acute, symptomatic with a varying severity andlocalization, but may also be sporadic, recurrent, or chronic. It isessential to understand the molecular basis of disease diversity onthe bacterial side that determines the different disease types. UPECare a geno- and phenotypically heterogeneous group of isolatesrestricted to a small number of O-serogroups that seem to representdifferent subclasses of facultative pathogens (3–5).
UPEC virulence factors are frequently encoded on pathogenicityislands (PAIs) (6–9). The two O6 strains 536 (pyelonephritisisolate) and CFT073 (urosepsis isolate) became generally acceptedUPEC model organisms, and several PAIs of them have beendescribed in detail (10–16). The complete CFT073 genome se-quence shows a mosaic structure in terms of the distribution ofbackbone genes conserved in E. coli, and ‘‘foreign’’ genes, which
presumably have been acquired horizontally (17). Genome com-parison of CFT073, O157:H7 strain EDL933, and K-12 strainMG1655 revealed that only 39.2% of their combined set of proteinsare common to all three strains (17–20), underlining the astonishingdiversity among E. coli. Furthermore, the genome sequence ofCFT073 revealed 1,623 strain-specific genes (21.2%). Comparisonof both UPEC phenotypes and their genomes with other completeE. coli genome sequences should therefore help to identify sets of‘‘UPEC-specific’’ and strain-specific proteins, respectively, that mayform the basis of their different individual phenotypes and uro-pathogenic potential.
Results and DiscussionE. coli 536 Genome Sequence Determination and Comparative Anal-ysis. The genome consists of a single circular chromosome of4,938,875 bp. No plasmids were found. The 536 genome is 292 kbsmaller than that of strain CFT073. Essentially, the additional DNAin CFT073 harbors genes of five cryptic prophages, which are absentfrom strain 536, as well as genes that are located in islands absentfrom other E. coli. The E. coli 536 genome contains one crypticprophage region.
For the 536 genome, 4,747 putative coding sequences werepredicted, !3,650 of which (77%) have highly similar orthologs inMG1655 (Fig. 1). From the remaining ORFs, 524 are also presentin CFT073, which means that 89% of all ORFs of E. coli 536 havehighly similar orthologs in the UPEC CFT073 genome (Table1 andTable 4, which is published as supporting information on the PNASweb site). Further comparison with the genome sequences ofenterohemorrhagic E. coli (EHEC) O157:H7 strains Sakai andEDL933 (18, 19) revealed !3,560 ORFs (75%) with highly similarorthologs in all published complete E. coli genomes. Of theremaining ORFs present in the genomes of strain 536 and at leastone of these other four E. coli strains, 427 are mainly located withina region of the cryptic prophage or within the major PAIs of strain536 (Table 5, which is published as supporting information on thePNAS web site).
Key features of the completely sequenced E. coli genomes are
Conflict of interest statement: No conflicts declared.
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: EHEC, enterohemorrhagic E. coli; ExPEC, extraintestinal pathogenic E. coli;IPEC, intestinal pathogenic E. coli; PAI, pathogenicity islands; PTS, phosphotransferasesystem; UPEC, uropathogenic E. coli; UTI, urinary tract infection.
Data deposition: The complete genome sequence of E. coli strain 536 has been depositedin the GenBank database (accession no. CP000247).!To whom correspondence should be addressed at: Institut fur Molekulare Infektionsbi-ologie, Rontgenring 11, D-97070 Wurzburg, Germany. E-mail: [email protected].
© 2006 by The National Academy of Sciences of the USA
www.pnas.org#cgi#doi#10.1073#pnas.0603038103 PNAS " August 22, 2006 " vol. 103 " no. 34 " 12879–12884
MIC
ROBI
OLO
GY
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
MinorGroove
MajorGroove
1.12
3600= one helical
turn
10.5 bp per turn
34.30 twist angle
(rotation per residue)3.4A Axial Rise
Base Pair Tilt - 6o
Helix Pitch
35.7A
34.3o
Helix Diameter
20A
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Marini gel
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U2.1
200 bp
Straight DNA
A.
200 bp
Bent DNA
B.
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Agarose gel
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
A=TA=T
2.5
Tilt Roll
A=T A=T
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Random Bends
Bends Phased at 10.5 bp
Bends Phased at 16 bp
2.6
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
CCCAAAAATGTCAAAAAATAGGCAAAAAATGCCAAAAATCCCAAC
1
.
11
.
21
.
31
.
41
.
2.3
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
katP >
espP >
L7028 >
L7031 >
etpD
>
etpE >
etpF >
etpL >
EHEC-hlyA >
EHEC-hlyB >
L7072
>
L70
81 >
L70
86 >
L70
91 >
L7095
>
traI >
EHEC-hlyC
EHEC
-hlyB
EHEC
-hlyA
L709
5
0k12.5k
25k
37.5k50k62
.5k
75k
GENOME ATLAS
Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/
Genetica, 108:47-51, 2000
Intrinsic Curvaturedevavg
0.07 0.30
Stacking Energydevavg
-9.52 -6.40
Position Preferencedevavg
0.11 0.17
Annotations:
CDS +
CDS -
Direct Repeatsfixavg
5.00 7.50
Inverted Repeatsfixavg
5.00 7.50
Percent ATfixavg
0.30 0.70
Resolution: 19
E. coli pO157GenBank Acession AF074613 92,077 bp
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Promoter Structural profile
-10 “TATA box”-35“UP”
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
!400 !200 0 200 400
!0.0
50.
000.
050.
100.
150.
200.
25Ecoli_K!12_W3110_Main: Stacking energy
Distance from translation start
Z!sc
ore
CURVATURE
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
!400 !200 0 200 400
!0.0
50.
000.
050.
100.
150.
20
Ecoli_K!12_W3110_Main: Structural Profile
Distance from translation start
Z!sc
ore AT content
Position PreferenceStacking EnergyIntrinsic CurvatureDNAase sensitivity
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Promoter Structural profile
+1
CDS
DNA curvature,flexibility important
here
meltsrigid
cruciform
-10-35
mRNACDS
β / β’-10
-35
+1
σα
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
J)
I)
H)
G)
F)
E)
ubiD > trkH >< fadB rrsA > rrlA >D)
C)
B)
A)
0k 2.5k 5k 7.5k 10k 12.5k 15k 17.5k 20k
rrsA 20,000 bp
Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/
GENOME ATLAS
A) sigma -0.030fixavg
0.00 0.50
B) sigma -0.035fixavg
0.00 0.50
C) sigma -0.040fixavg
0.00 0.50
D) Annotations:
CDS +
CDS -
rRNA
tRNA
E) sigma -0.045fixavg
0.00 0.50
F) sigma -0.050fixavg
0.00 0.50
G) sigma -0.055fixavg
0.00 0.50
H) sigma -0.060fixavg
0.00 0.50
I) sigma -0.065fixavg
0.00 0.50
J) sigma -0.070fixavg
0.00 0.50
Resolution: 9
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
2.10
"Refined" Junction Model
CCGG
Tilt
CGGC5' 3'
GGCC GCCG
A
T
5' 3'
AA A
AA
TT
Roll
+
Tilt
3' 5'
T TT
A
T
A
T
A
T
A
T
A
T
A
T
A
T
A
T
A
T
2.11
5'3'
5'3'
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
nucleosomes
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
chromatin
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Trimmed 1
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Trimmed 2
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
trinuc. % Out AAT/ATT -0.280 AAA/TTT -0.274 CCA/TGG -0.246 AAC/GTT -0.205 ACT/AGT -0.183 CCG/CGG -0.136 ATC/GAT -0.110 AAG/CTT -0.081 CGC/GCG -0.077 AGG/CCT -0.057 GAA/TTC -0.037 ACG/CGT -0.033 ACC/GGT -0.032 GAC/GTC -0.013 CCC/GGG -0.012 ACA/TGT -0.006 CGA/TCG -0.003 GGA/TCC 0.013 CAA/TTG 0.015AGC/GCT 0.017 GTA/TAC 0.025 AGA/TCT 0.027 CTC/GAG 0.031 CAC/GTG 0.040 TAA/TTA 0.068 GCA/TGC 0.076 CTA/TAG 0.090 GCC/GGC 0.107 ATG/CAT 0.134 CAG/CTG 0.175 ATA/TAT 0.182 TCA/TGA 0.194
“Travers” trinucleotide scale:
Satchwell,S.C., Drew,H.R., and Travers,A.A., "Sequence periodicities in chicken nucleosome core DNA", J. Mol. Biol., 191:659-675, (1986).
Baldi,P., Brunak,S., Chauvin,Y., and Krogh,A., "Naturally occurring nucleosome positioning signals in human exons and introns", J. Mol. Biol., 263:503-510, (1996).
Low position
pref.
Highposition
pref.
Highposition
pref.
NOTE: we use a (slight) modification, in which the absolute value (magnitude) of the values is used to reflect trinucleotides which tend to exclude nucleosomes.
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
A genomic code for nucleosomepositioningEran Segal1, Yvonne Fondufe-Mittendorf2, Lingyi Chen2, AnnChristine Thastrom2, Yair Field1, Irene K. Moore2,Ji-Ping Z. Wang3 & Jonathan Widom2
Eukaryotic genomes are packaged into nucleosome particles that occlude the DNA from interacting with most DNAbinding proteins. Nucleosomes have higher affinity for particular DNA sequences, reflecting the ability of the sequenceto bend sharply, as required by the nucleosome structure. However, it is not known whether these sequence preferenceshave a significant influence on nucleosome position in vivo, and thus regulate the access of other proteins to DNA. Herewe isolated nucleosome-bound sequences at high resolution from yeast and used these sequences in a newcomputational approach to construct and validate experimentally a nucleosome–DNA interaction model, and to predictthe genome-wide organization of nucleosomes. Our results demonstrate that genomes encode an intrinsic nucleosomeorganization and that this intrinsic organization can explain ,50% of the in vivo nucleosome positions. This nucleosomepositioning code may facilitate specific chromosome functions including transcription factor binding, transcriptioninitiation, and even remodelling of the nucleosomes themselves.
Eukaryotic genomic DNA exists as highly compacted nucleosomearrays called chromatin. Each nucleosome contains a 147-base-pair(bp) stretch of DNA, which is sharply bent and tightly wrappedaround a histone protein octamer1. This sharp bending occurs atevery DNA helical repeat (,10 bp), when the major groove ofthe DNA faces inwards towards the histone octamer, and again,5 bp away, with opposite direction, when the major groove facesoutward. Bends of each direction are facilitated by specific dinucleo-tides2,3. Neighbouring nucleosomes are separated from each other by10–50-bp-long stretches of unwrapped linkerDNA4; thus, 75–90%ofgenomic DNA is wrapped in nucleosomes. Access to DNA wrappedin a nucleosome is occluded1 for polymerase, regulatory, repair andrecombination complexes, yet nucleosomes also recruit other pro-teins through interactions with their histone tail domains5. Thus, thedetailed locations of nucleosomes along the DNA may have import-ant inhibitory or facilitatory roles6,7 in regulating gene expression.DNA sequences differ greatly in their ability to bend sharply2,3,8.
Consequently, the ability of the histone octamer to wrap differingDNA sequences into nucleosomes is highly dependent on the specificDNA sequence9,10. In vitro studies show this range of affinities to be1,000-fold or greater11. Thus, nucleosomes have substantial DNAsequence preferences. A key question is whether genomes use thesesequence preferences to control the distribution of nucleosomesin vivo in a way that strongly impacts on the ability of DNA bindingproteins to access particular binding sites. By controlling binding siteaccessibility in this way, genomes could, for example, target thebinding of transcription factors towards appropriate sites and awayfrom irrelevant, non-functional sites9.One view is that the sequence preferences of nucleosomes might
not be meaningful. Nucleosome positions might be regulated in cellsin trans by the abundant12 ATP-dependent nucleosome remodellingcomplexes13, which might over-ride the sequence preferences ofnucleosomes and move them to new locations whenever needed.Another view, however, is that remodelling factors do not themselves
determine the destinations of the nucleosomes that they mobilize.Rather, the remodelling complexes may allow nucleosomes to samplealternative positions rapidly, resulting in a thermodynamic equili-brium between the nucleosomes and the site-specific DNA bindingproteins that compete with nucleosomes for occupancy along thegenome. In this view, nucleosome positions are regulated in cis bytheir intrinsic sequence preferences, which would then have signifi-cant regulatory roles. In this cis regulation model, we expect thegenome to encode a nucleosome organization, intrinsic to the DNAsequence alone, comprising sequences with both low and highaffinity for nucleosomes. Many of the high-affinity sequences shouldthen be occupied by nucleosomes in vivo. Moreover, the detaileddistribution of nucleosome positions encoded by the genome shouldsignificantly influence chromosome functions genome-wide.Here we report the results of a combined experimental and
computational approach to detect the DNA sequence preferencesof nucleosomes and the intrinsic nucleosome organization of thegenome that these preferences dictate. Our findings demonstrate thateukaryotic genomes use a nucleosome positioning code, and link theresulting nucleosome positions to specific chromosome functions.
Validating a nucleosome–DNA interaction modelTo construct a model for nucleosome–DNA interactions in yeast(Fig. 1a), we used a genome-wide assay to isolate DNA regions thatwere stably wrapped in nucleosomes. Our experimental methodmaps nucleosomes on the yeast genome with greater accuracy thanprevious approaches, resulting in a set of 199mononucleosome DNAsequences of length 142–152 bp (Supplementary Fig. 1). We used thiscollection of sequences to construct a probabilistic model thatrepresents the DNA sequence preferences of yeast nucleosomes(Supplementary Fig. 2). Our approach resembles that used forrepresenting the binding specificities of transcription factors froma collection of known sites, but with two main distinctions: first, incontrast to the mononucleotide probability distributions used for
ARTICLES
1Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel. 2Department of Biochemistry, Molecular Biology and CellBiology, Northwestern University, 2153 Sheridan Road, Evanston, Illinois 60208, USA. 3Department of Statistics, Northwestern University, 2006 Sheridan Road, Evanston, Illinois60208, USA.
Vol 442|17 August 2006|doi:10.1038/nature04979
772©!2006!Nature Publishing Group!
!
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
July 25, 2006
Scientists Say They’ve Found a Code Beyond Genetics in DNA
By NICHOLAS WADEResearchers believe they have found a second code in DNA in addition to the genetic code.
The genetic code specifies all the proteins that a cell makes. The second code, superimposed on the first, sets the placement of the nucleosomes, miniature protein spools around which the DNA is looped. The spools both protect and control access to the DNA itself.
The discovery, if confirmed, could open new insights into the higher order control of the genes, like the critical but still mysterious process by which each type of human cell is allowed to activate the genes it needs but cannot access the genes used by other types of cell.
The new code is described in the current issue of Nature by Eran Segal of the Weizmann Institute in Israel and Jonathan Widom of Northwestern University in Illinois and their colleagues.
There are about 30 million nucleosomes in each human cell. So many are needed because the DNA strand wraps around each one only 1.65 times, in a twist containing 147 of its units, and the DNA molecule in a single chromosome can be up to 225 million units in length.
Biologists have suspected for years that some positions on the DNA, notably those where it bends most easily, might be more favorable for nucleosomes than others, but no overall pattern was apparent. Drs. Segal and Widom analyzed the sequence at some 200 sites in the yeast genome where nucleosomes are known to bind, and discovered that there is indeed a hidden pattern.
Knowing the pattern, they were able to predict the placement of about 50 percent of the nucleosomes in other organisms.
The pattern is a combination of sequences that makes it easier for the DNA to bend itself and wrap tightly around a nucleosome. But the pattern requires only some of the sequences to be present in each nucleosome binding site, so it is not obvious. The looseness of its requirements is presumably the reason it does not conflict with the genetic code, which also has a little bit of redundancy or wiggle room built into it.
Having the sequence of units in DNA determine the placement of nucleosomes would explain a puzzling feature of transcription factors, the proteins that activate genes. The transcription factors recognize short sequences of DNA, about six to eight units in length, which lie just in front of the gene to be transcribed.f the nucleosome code was “a profound insight if true,” because it would explain many aspects of how the DNA is controlled.
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Spombe - 3 chromosomes
Nature, 415:871-880, (2002).
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
S.pombe chr. 1
DNA Conformation and Transcription, chapt14, (edited by Takashi Ohyama, 2004).
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Spombe zoom
DNA Conformation and Transcription, chapt14, (edited by Takashi Ohyama, 2004).
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Terminus
Origin
rhsA
rhsB
rhsC
rhsD
rhsE
phnM ya
gF
rrsA
rrsB
rrsC
rrsD
rrsE
rrsG
rrsH
0M0.5M
1M1.5M
2M2.5M
3M3.
5M4M
GENOME ATLAS
Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/
Intrinsic Curvaturedevavg
0.16 0.21
Stacking Energydevavg
-8.74 -7.63
Position Preferencedevavg
0.14 0.16
Annotations:
CDS +
CDS -
rRNA
tRNA
Watson Repeatsfixavg
5.00 7.50
Crick Repeatsfixavg
5.00 7.50
GC Skewfixavg
-0.03 0.03
Percent ATfixavg
0.45 0.55
Resolution: 1856
E. coli K-12isolate MG1655 4,639,221 bp
ygeG
rfaJ
rpoB
flu
ycgZ
rpsQ
b2651
phnI
rpsO
nohB
yagG
atpF
lpdA
ydfO
b0725
yjhA
yhiF
yi21_2
b037
3
0M0.5M
1M1.5M
2M2.5M
3M3.
5M4M
GENOME ATLAS
Center for Biological Sequence Analysishttp://www.cbs.dtu.dk/
Intrinsic Curvaturedevavg
0.16 0.21
Stacking Energydevavg
-8.66 -7.71
Position Preferencedevavg
0.14 0.16
Annotations:
CDS +
CDS -
rRNA
tRNA
Global Direct Repeatsfixavg
5.00 7.50
Global Inverted Repeatsfixavg
5.00 7.50
GC Skewdevavg
-0.05 0.05
Percent ATfixavg
0.20 0.80
Resolution: 1856
E. coli 4,639,221 bp
J. Mol. Biol, 299:907-930, (2000).
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
!"#$%"$#&' %'$(")# &*&'+(,(
!" #$%&$ '# (")&*'(+,'& -.&'.&$ '.&$& (* , /#""&/0'(#" 1&'-&&" '.& 2$&%(/'&% *'$3/'3$,4 /.,$,/'&$0(*'(/* #5 , +&"& #" #"& .,"%6 ,"% ('* 53"/'(#" #"'.& #'.&$ .,"%6 -& 2&$5#$7&% , /43*'&$ ,",48*(*1,*&% #" '.& 9)& *'$3/'3$,4 2,$,7&'&$*: ;$(&<86 -&/,4/34,'&% '.& ,)&$,+& #5 &,/. 7&,*3$& 5#$ ,44=>>> 12 -("%#-* /&"'&$&% #" ?@A #$ 2$#'&("0&"/#%("+ +&"&*6 "#$7,4(B&% '.& ),43&* C1,*&% #"'.& +&"#7(/ ,)&$,+& ,"% *',"%,$% %&)(,'(#"D6 ,"%'.&" 3*&% '.& $&*34'("+ 9)& ),43&* '# /43*'&$ '.&-("%#-*: E& '.3* '$&,' &,/. +&"&0/&"'&$&% -("0%#- ,* , 2#("' (" , 9)&0%(7&"*(#",4 FF*'$3/'3$&*2,/&GG6 ,"% 3*& '.& H3/4(%&," %(*',"/& 1&'-&&"'.&7 ,* , *(724& 7&,*3$& #5 *'$3/'3$,4 *(7(4,$('8C*&& I&'.#%* 5#$ 7#$& %&',(4* #" /43*'&$("+D:J(+3$& K (* , %(*',"/& '$&& '.,' *377,$(B& '.& #)&$0,44 '#2#4#+8 #5 '.(* *2,/&: !" '.(* 24#'6 ,44 +&"&*.,)& 1&&" %()(%&% ("'# LL /43*'&$* ,"% '.& '$&&*.#-* '.& $&4,'()& 2#*('(#" #5 '.& 7(%02#("'* #5'.&*& C'.& 9)& /##$%(",'&* #5 &,/. /&"'$#(%6 (" *',"0%,$% %&)(,'(#" 3"('*6 (* +()&" (" '.& 4&+&"% '#J(+3$& KD: A' '.& 1,*& #5 &,/. 1$,"/.6 '.& "371&$#5 +&"&* (" '.,' /43*'&$ (* ("%(/,'&%:M# ,",48B& -.&'.&$ '.&$& (* , /#""&/'(#"
1&'-&&" '.& '$&& *'$3/'3$& ,"% +&"& 53"/'(#",4('86-& 3*&% -#$%0,",48*(* *#5'-,$& '.,' -& .,)&%&)&4#2&% 5#$ (")&*'(+,'(#" #5 8&,*' 2$#7#'&$*CN&"*&" O P"3%*&"6 Q>>>D: ;$(&<86 '.& ,22$#,/.-,* ,* 5#44#-*: J$#7 '.& R&";,"S 94& -& 9$*' /#404&/'&% ,44 '.& 53"/'(#",4 ,""#','(#"* 5#$ &,/. +&"&:J#$ &,/. /43*'&$ -& '.&" %()(%&% '.& ,""#','(#"("'# '-# +$#32*T #"& /#"',("("+ '.& ,""#','(#" 5#$'.& (")&*'(+,'&% /43*'&$ C'.& 2#*('()& *&'D ,"%,"#'.&$ +$#32 /#"',("("+ '.& ,""#','(#" 5#$ ,44 '.&$&7,("("+ +&"&* C'.& "&+,'()& *&'D: ;8 /#3"'("+-#$% 5$&U3&"/(&* (" '.& '-# *&'* ,"% 3*("+ .82&$0+&'$(/ *','(*'(/*6 (' -,* '.&" 2#**(14& '# 9"%,""#','(#" S&8-#$%* '.,' ,$& *(+"(9/,"'48 #)&$0$&2$&*&"'&% (" '.& 2#*('()& *&' CN&"*&" O P"3%*&"6Q>>>D: V" J(+3$& K -& .,)& ("%(/,'&% '.& 7#*' *(+0"(9/,"' -#$%* 5#3"% (" '.(* -,8:A* (' /," 1& *&&"6 *&)&$,4 +$#32* %(% %(*24,8 *(+0
"(9/,"' #)&$0$&2$&*&"','(#" #5 S&8-#$%*: !" *#7&/,*&* , /43*'&$ /#"',("* #"48 , 5&- +&"&*6 7#*' #5-.(/. 1&4#"+ '# #"& #$ 7#$& #2&$#"* #5 $&4,'&%53"/'(#": J#$ &W,724&6 ,7#"+ '.& Q> +&"&* ("/43*'&$ LL6 '&" 1&4#"+ '# '.& !"# #2&$#"6 +()("+ '.&&"'($& /43*'&$ ," #)&$0$&2$&*&"','(#" #5 '.&S&8-#$%* FF2.#*2.#",'& 7&',1#4(*7GG: A"#'.&$&W,724& (* /43*'&$ X -.(/. /#"',("* , "371&$ #5+&"&* 5$#7 '-# %(55&$&"' +$#32* #5 +&"&* '.,' &,/.,$& (")#4)&% (" 4(2#2#48*,//.,$(%& *8"'.&*(* C'.&$%& ,"% '(( +&"&*D: Y43*'&$ X (* /.,$,/'&$(B&% 18&W'$&7& ),43&* 5#$ ,44 9)& 7&,*3$&*6 ,"% %(*24,8*, )&$8 .(+. 4&)&4 #5 /3$),'3$&6 )&$8 4#- <&W(1(4('8,"% &W'$&7&48 .(+. *',/S("+ &"&$+8 C/#$$&*2#"%("+'# 3"*',14& Z@AD: Y43*'&$ = (* *'$3/'3$,448 *(7(4,$'# /43*'&$ X ,"% /#"',("* '.$&& ,%%('(#",4 $%& +&"&*:!"'&$&*'("+486 )* +,-. *'$,("* 4,/S("+ '.& [\ 2$#'&("6%(*24,8 , 2.&"#'82& '.,' $&*&714&* '.& %&&20$#3+.2.&"#'82& *&&" (" $%& 73',"'* C],("1&"( /0 &-:6
L^^KD: _("/& [\ (* S"#-" '# ("'&$,/' -('. /3$)&%#$ S("S&% Z@A6 ,"% *("/& 7#*' #5 '.& $%& #2&$#"%(*24,8* &W'$&7& *'$3/'3$,4 2$#2&$'(&* C("/43%("+)&$8 .(+. /3$),'3$&D6 (' (* '&72'("+ '# *3++&*' '.,''.& ("'&$,/'(#" 1&'-&&" [\ ,"% '.(* /.$#7#*#7,4$&+(#" 24,8* , $#4& (" &W2$&**(#" #5 '.& +&"&*6,4'.#3+. (' .,* 1&&" $&2#$'&% '.,' ,22,$&"'48 '.&[\0%&9/(&"' 73',"'* %# "#' .,)& , '$3"/,'&% 4(2#02#48*,//.,$(%& C],("1&"( /0 &-:6 L^^KD:Y43*'&$ L> (* ,%`,/&"' '# /43*'&$* X ,"% =6 ,"%
%(*24,8* *(7(4,$ C13' 4&** &W'$&7&D *'$3/'3$,4 5&,0'3$&* #5 .(+. /3$),'3$&6 4#- <&W(1(4('86 ,"% .(+.*',/S("+ &"&$+8: Y#"*(*'&"'486 '&" #5 '.& Q> &W'$&7&L>>> 12 $&+(#"* %(*/3**&% ,1#)& C,"% -.(/. %(*024,8&% '.& *,7& *'$3/'3$,4 5&,'3$&*a M,14& LD ,$&("/43%&% (" #"& #5 '.&*& '.$&& /43*'&$*: !" ,+$&&07&"' -('. '.& +&"#7&0-(%& '$&"% 5#$ .(+.&$ /3$0),'3$& "&,$ '.& '&$7("3*6 '.& +&"&* (" /43*'&$ L>*.#- , -&,S '&"%&"/8 '# 1& 4#/,'&% (" '.(* $&+(#"C%,', "#' *.#-"D: M.(* 5,/' (* ,4*# /#"*(*'&"' -('.'.& #1*&$),'(#" '.,' '.& S&8-#$% FF2.,+&GG (* #)&$0$&2$&*&"'&% (" /43*'&$ L> CQ> #5 '.& L>X +&"&* .,)&'.& -#$% (" '.&($ ,""#','(#"D6 *("/& '.& '&$7("3*$&+(#" (* S"#-" '# /#"',(" 7,"8 2.,+& ,"% '$,"*02#*#"0$&4,'&% +&"&* C[(446 L^^bD:
!"#$%& '( _'$3/'3$,4 /43*'&$ ,",48*(*: Z(*',"/& '$&&*.#-("+ '.& $&4,'()& 4#/,'(#" #5 LL +&"& /43*'&$* 1,*&%#" ,)&$,+& *'$3/'3$,4 7&,*3$&*: M.& "371&$ #5 +&"&* ("&,/. /43*'&$ (* ("%(/,'&% ,' '.& 1,*& #5 '.& 1$,"/.: _(+0"(9/,"'48 #)&$02$&*&"'&% ,""#','(#" S&8-#$%* ,$& ("%(0/,'&% ,' '.& &"% #5 1$,"/.&*: @,7&* #5 +&"&* 7&"'(#"&%(" '.& '&W' ,$& ("%(/,'&% (" 2,$&"'.&*&*: M.& /&"'$#(%/##$%(",'&* #5 '.& LL /43*'&$* ,$& +()&" 1&4#- C),43&*,$& "#$7,4(B&% ,"% ,$& (" 3"('* #5 *',"%,$% %&)(,'(#"*5$#7 +&"#7(/ ,)&$,+&: M.&8 ,$& 4(*'&% (" '.& 5#44#-("+#$%&$T /3$),'3$&6 Z@,*&!6 2#*('(#" 2$&5&$&"/&6 %&5#$70,1(4('86 ,"% *',/S("+ &"&$+8D: Y43*'&$ LT >:K6 !>:b6 >:c6!>:b6 >:b: Y43*'&$ QT !>:b6 >:c6 >:L6 >:b6 !>:b: Y43*'&$ cT!>:c6 >:^6 !L:b6 !>:c6 >:L: Y43*'&$ XT c:c6 !Q:X6 Q:L6 !X:X6X:b: Y43*'&$ =T c:>6 !c:c6 Q:Q6 !Q:d6 c:L: Y43*'&$ bT !>:=6Q:b6 !c:c6 >:K6 >:=: Y43*'&$ KT !L:b6 L:L6 !X:b6 >:L6 !>:d:Y43*'&$ dT !L:K6 Q:>6 !L:K6 >:d6 !>:^: Y43*'&$ ^T !Q:L6 c:^6!Q:d6 >:K6 !>:b: Y43*'&$ L>T Q:c6 !L:c6 >:K6 !Q:d6 Q:d: Y43*0'&$ LLT !Q:^6 Q:Q6 L:L6 Q:K6 !Q:d:
!"" !"# $%&'(%'&)* #%*)+ ,-& #$%&'()%&)* %+,)
J. Mol. Biol, 299:907-930, (2000).
low position preference
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Histogram of log(affy.ex[, i])
log(affy.ex[, i])
Density
7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Histogram of log(affy.ex[, i])
log(affy.ex[, i])
Density
7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Histogram of log(affy.ex[, i])
log(affy.ex[, i])
Density
7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Histogram of log(affy.ex[, i])
log(affy.ex[, i])
Density
7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Highly Expressed Genes
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
niceDogma
Comparative Microbial Genomics Group Ce
nte
r for B
iolo
gic
al S
eq
ue
nc
e A
na
lysis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
D
TU
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Biochimie, 83:201-212, (2001).
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U
Summary
1. DNA sequence DNA structure Function
2. DNA curvature is caused mainly by phased A-tracts.
3. Where you are in the chromosome is important!
Comparative Microbial Genomics groupC
en
ter fo
r Bio
log
ica
l Se
qu
en
ce
An
aly
sis
Th
e T
ech
nic
al U
niv
ers
ity o
f De
nm
ark
DT
U