CAN WE DESIGN DE NOVO A BACTERIAL GENOME?adanchin//lectures/Danchin_Zurich_07.pdfCAN WE DESIGN DE...
Transcript of CAN WE DESIGN DE NOVO A BACTERIAL GENOME?adanchin//lectures/Danchin_Zurich_07.pdfCAN WE DESIGN DE...
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CAN WE DESIGN DE NOVO ABACTERIAL GENOME?
Synthetic Biology 3.0 ConferenceZürich, 25 june 2007
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
AUTHORS
Génétique in silico Marc Bailly-Béchet Massimo Vergassola
Abdus Salam InternationalCenter in Theoretical Physics
Mudassar Iqbal Matteo Marsili
Génétique des Génomes Bactériens(in silico)
Gang FangEtienne LarsabalEduardo Rocha
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
SYNOPSIS Physics will help us, not hinder our efforts to construct a
synthetic cell, provided we take the right constraints intoaccount [we should also remember chemical constraints]
Biological constraints are essential, including: The paleome, reminiscent of a scenario of the origin of life, which
forms a replicator and a constructor, as in a « living computer » The cenome, which allows cells to live in and explore a particular
niche [cf « biocenose »] A synthetic genome needs to put together counterparts of
the first class. The second class, which could be the subjectof specific design will achieve the goal of the construct
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Three processes constitute life:
Information transfer; genomics unveils the organization of theprogram associated to the cell
Forces coupling the genome structure to the structure of the cell:MetabolismCompartmentalization
The cell is the atom of life
A first hint of a link between genome organization and thearchitecture of the organism is visible here: prokaryotic genomesequences look random, eukaryotic genomes look repeated
WHAT LIFE IS
A. DanchinThe Delphic Boat; What genomes tell us. (2003) Harvard University Press
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE “GENETIC PROGRAM”
Physics: matter, energy, timeStatistical physics: Physics + informationBiology: Physics + information, coding, control...Arithmetics: sequences of integers, recursivity,coding…Computation: Arithmetics + programs + machine...
The « genetic program » metaphor has practicalconsequences: we know how to manipulate genes andgene products, can we push the metaphor to its ultimateconsequences?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Two entities permit computing:
A machine able to read and write
A program on a physical support (typically, a punchedor magnetic tape illustrates the sequential order of thesymbols that make the program), split (in pratice, butnot conceptually) into two entities:
Program (providing the goal)Data (providing the context)
The machine is distinct from the program
WHAT COMPUTING IS
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE TURING MACHINE
the program(data)as a linearsequenceof symbols
the machine(read/write)
is physicallydistinctfrom
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
CELLS AND COMPUTERS
Genetics rests on the description of genomes as texts writtenwith a four letter alphabet: do cells behave as computers?
Horizontal Gene TransferVirusesGenetic engineeringAnimal cloning (and now direct transformation of a wholegenome into a recipient cell)
all points to separation between
« Machine » (the cell factory)andData + program
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
AN ALGORITHMIC VIEW OFBIOLOGICAL ACTIONS
Replication, transcription, translation: high parallelism
“Begin, Check Control Points, Repeat, End”
The action is always oriented, with a beginning and an end
The processes of time control (check points) are rarelytaken into account (except for the replication/divisionprocesses), but their role is essential to allow coordinationof multiple actions in parallel
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A RECURSIVE MACHINE
Replicator: DNA specifies proteinsthat replicate DNA
Constructor: DNA specifies proteinswhich form the machine that constructsthe cell
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
John von Neumann, trying to understand the brain,suggested that were the computer both to behaveas a computer and to construct the machine itself, itshould harbour an image of the machinesomewhere.
That special computer had to be split into a replicatorand a constructor, which expresses the program forconstruction of both the replicator and theconstructor.
The metaphor does not appear to apply to the brain,does it apply to the cell?
IS THERE A MAP OF THECELL IN THE CHROMOSOME?
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Drosophiloculus,
Homunculus?
Celluloculus?
The mystery ofhomeogenes’
origin
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
15
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE PHYSICS OF REPLICATION
DNA forms a long folded thread: how dothe daughter molecules separate?
Are physical constraints reflected in thesequence
[Replication is oriented: the physics of astrand cannot be that of its complement]
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A widely spread idea linksentropy with disorder
0
t
....... .... . ..... ........ .
.... ....... ....
. ..... ...
.... .. ....
. . .. .
. ..........
. .... . ...
.. .. ... . ... ... ....... ... . . .
.... ..... ...
.
. ....... .... ..
. . ... .. . ... ..
. ..... .. .. .. . . .
......
.. .... .Benjamin Crowell, licensed under the CreativeCommons Attribution-ShareAlike license
S = k log Ω
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
However an increase in entropy isenough to separate chromosomes
0
30 min Jun S, Mulder B. Entropy-driven spatial organization of highly confined polymers:lessons for the bacterial chromosome .Proc Natl Acad Sci U S A. 2006 103:12388-93.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Evolution optimises this physicalphenomenon, while DNA needs also tosupport gene sequences
This is witnessed by: A period 3, signature of the codon
succession in genes (constrained by thegenetic code rule)
A period 10-11.5 of yet unknown function...
A NEED FOR SYNTHETIC BIOLOGY
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERIODS IN GENOMES
One observes a correlationbetween base pairs withperiod three.
After deconvolution of thisperiod there remains asomewhat fuzzy period of 10to 11.5 base pairs
Eskesen et coll. BMC Molecular Biology Volume 5, 12, 2004
20
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
A UNIVERSAL FEATURE OF THE GENOME TEXT: 10-11.5
motifs
0 10 20 30 40 50 60 70 80 90 100bp
0 10 20 30 40 50 60 70 80 90 100bp
ffss(G(G--))-0.01
0
0.01
0 10 20 30 40 50 60 70 80 90 100bp
real
model
validation
Helicobacter pylori
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TYPE A FLEXIBLE MOTIFS
methods
1- 1-xxAAxxxxxxxxTTxxxxxxxxAAxxxxxxxxTTTTxxxxxxxxxxAAxxxxxxxxTTxxxxxxxxAAxxxxxx: : All domainsAll domains 2-2-xxxxxxxxxxxxxxxxxxxxxxGGxxxxxxxxTTTTxxxxxxCCxxxxxxxxxxTTxxxxxxxxxxxxxxxxxx:: ProteobacteriaProteobacteria 4- 4-xxxxxxxxxxxxTTxxxxxxxxAGAGxxxxxxTTTTxxxxxxxxxxxxxxxxTTxxxxxxxxxxxxxxxxxxxx:: ArchaeaArchaea 55''--xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxx-10xxxxxxxxx0xxxxxxxx10xxxxxxbp-3bp-3''
TTTTxxxxxxGGxxxxxxTTxxxxxxxxxxxxxxxxxxxxTTTT
Nucleotides forming this class of motifsare fully accessible on this side of thehelix and the dinucleotides are locatedin the small groove
Nucleotides forming this class of motifsare fully accessible on this side of thehelix and the dinucleotides are locatedin the large grooveTTTT
GG TT TTTT
AAAACC
AA AAAA
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
FLEXIBLE MOTIFS ACCOMMODATE LOCALVARIATIONS OF THE DNA STRUCTURE
The flexiblility of these motifs allow DNA to take intoaccount superturns and bends
Larsabal E, Danchin A.Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns »BMC Bioinformatics. 2005 6:206
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Codon usage biases areassociated to functions
Core metabolism
Class I: centralmetabolism
Horizontally exchanged genes
Class III: horizontaltransfer
Class II: high expressionin exponential growth
Genes highlyexpressed in
exponential growth
e.g. the codon usage of thegene involved in histidinemetabolism is highly biased
Class I:centralmetabolism
Class II : high expressionin exponential growth
Class III :horizontaltransfer
Medigue C, Rouxel T, Vigier P, Henaut A, Danchin A.Evidence for horizontal gene transfer in Escherichia coli speciation.J Mol Biol. 1991 222:851-856.
25
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LOCAL BIASES OF CODON USAGE
Correspondence Analysis shows that genes with similarbiases are functionnally related. How is this reflected in thechromosome?
A clustering method based on information theory groups thegenes into homogeneous families, which appear not to berandomly spread in the chromosome. The method identifies4 classes in E. coli and 5 in B. subtilis)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Genes with similar bias areorganized into groups longer thanoperons, showing some translation-driven organization of thechromosome
A major part of this effect comes fromthe recycling or rare transfer RNAmolecules
Genomic translation islands
M. Bailly-Béchet
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
TRANSLATION ISLANDS
One groups is associatedto high expression (blue).
The other groups arealso functionnallyconsistent: horizontallytransferred genes (red),motility (yellow) andintermediary metabolism(green). M. Bailly-Béchet
M Bailly-Bechet, A Danchin, M Iqbal, M Marsili, M VergassolaCodon usage domains over bacterial chromosomesPLoS Computational Biology (2006) 2: april 20th
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LOOKING FOR THE REPLICATORAND THE CONSTRUCTOR
Are genes grouped randomly in the chromosomes?
Do we find different gene categories, in terms of the waythey are organized?At first sight, consistent with different DNA management processesin different organisms not much is conserved, while genestransferred from other organisms are distributed throughoutgenomes
However, groups of genes such as operons or pathogenicity islandstend to cluster in specific places, and they code for proteins withcommon functions. « Persistent » genes are clustered together
30
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENES
Laboratory essential genes are located in the leadingstrand. They are conserved in a majority of genomes.By contrast the genes that are conserved and located inthe leading strand make a particular category, whichdoubles the number of « essential » genes.
These genes make a universal category; 400-500genes persist in a majority of bacterial genomes; theyare not only involved in the three processes needed forlife, but in maintenance and in adaptation to transientphenomena; a fraction manages the evolution of theorganism.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
GENE PERSISTENCE
Which functional category?
Information transfer
Compartmentalization
Intermediary metabolism
Stress, maintenance and repair
Highly non random!
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
PERSISTENT GENES ARE CLUSTERED
Persistent genes are functionally defined
The way they group along chromosomesin more than 250 bacteria displays threeclusters that reflect a scenario of theorigin of life. This is why it is proposed toname paleome (from παλαιος, ancient)this group of core genes
A. Danchin. Homeotopic transformation and the origin of translation.Prog Biophys Mol Biol 1989, 54: 81-86.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
Persistent genes recapitulatethe origin of life
G. Fang
The external network, made ofgenes of intermediarymetabolism (nucleotides andcoenzymes, lipids), is highlyfragmented; the middlenetwork is built around class ItRNA synthetases, and theinner network, almostcontinuous, organized aroundthe ribosome, transcriptionand replication managesinformation transfers
A Danchin, G Fang, S NoriaThe extant core bacterial proteome is an archive of the origin of lifeProteomics. (2007) 7:875-889
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
35
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE COMPOSITE GENOME
Expecting two genome components, coding for themachine and for the “purpose” of the machine, weneed to separate between the replicator/constructorand secondary functions
Extant genomes should comprize ubiquitousfunctions (not genes!) which would correspond to theformer (here named the paleome) and functionsspecific to the environment of the organism (namedthe cenome — as in “biocenose” — to express thefact that these genes correspond to a specific niche)
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THE CENOME
Linkagefrequency
Frequencyin genomes
+frequent + rare
Known UnknownTo live To occupy a niche
Antibiotics Virulence
PenicillineVancomycine
Isoniazide
Rifampicine
Streptomycine
Tetracycline
Core genome<2000 genes
Variable genesalready > 50000 genes
Thecenome(Greek: κοινος,common)
vertical barsgroup 50 geneswith similarbehaviourtogether; coloredhorizontal linesindicate the levelof significance oflinkage
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
LIFE AND COMPUTATION SOME SIMPLE PHYSICAL CONSTRAINTS TRANSLATION ORGANIZES THE BACTERIAL GENOME THE PALEOME: CONSTRUCTOR AND REPLICATOR THE CENOME: THE “PURPOSE” OF THE MACHINE TOWARDS “SYNTHETIC BIOLOGY”
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
SYNTHESIS: A TALE OF TWO GENOMES
Life manifests first bygrowth and repair ofweathering: thecorresponding genomeexists since the origin, it isthe paleome. Explorationof the environment is aninevitable consequence ofexistence, it results fromcontinuous creation andexchange of the geneswhich form the cenome.
To liveTo escape death
To live in contextA. Danchin. Archives or Palimpsests? Bacterial Genomes Unveil a Scenario for the Origin of LifeBiological Theory (MIT Press) (2007) 2: 52-61.
Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG/
THANK YOU