Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity...

55
Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National Laboratory, New York

Transcript of Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity...

Page 1: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Quantifying contributions of mutations and homologous

recombination to E. coli genomic diversity

Sergei MaslovDepartment of Biosciences

Brookhaven National Laboratory, New York

Page 2: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Bacterial genome evolution happens in cooperation with

phages

+ =

Page 3: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Variation between E. coli strains

M Touchon et al. PLoS Genetics (2009)

Pan-genome of E. coli

Copy and Insert Copy and Replace

FW Studier, P Daegelen, RE Lenski,

S Maslov, JF Kim, JMB (2009)

Comparison of B vs K-12 strains of E. coli

Page 4: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Usual suspects are there but do not explain heterogeneity

Negative correlation with protein abundance: 2.5% of variation, P-value=10-5

Positive correlation with distance from origin of replication: 0.4% of variation, P-value=10-2

Page 5: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

High SNP numbers are clustered along the

chromosome

Page 6: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

RecombinedClo

nal

Page 7: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Page 8: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.
Page 9: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

SNPs by recombination/SNPs by clonal mutations

r/μ=6±1

Clonal regions Recombined regions

Recombined regions

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Page 10: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Strains: K-12 vsETEC-H10407 HS O157-H7-Sakai

Neutral model:Mutations and Recombinations among 70 “genes”, population of 104

C. Fraser et al.(2007) and (2009)

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Page 11: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Phase transition Δc=1.5%

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Page 12: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Page 13: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Why exponential tail?

Time to coalescence: Prob(t)= 1/Ne (1-1/Ne)t-1=exp(

exponential slope =1/2μNe or 1/θ

Population size Ne=1±0.1 x 109 consistent with earlier estimates

Page 14: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Why Ne << N ? Phages:

But: there are phages that cross species boundaries.

Also slope is similar for different species

Restriction modification system: Recombined segments are not continuous

[Milkman R, Bridges MM. Genetics 1990]

Recombination efficiency: Need 20-30 identical bases to start

recombination Our slope predicts 60 bases which roughly

matches30 in the neginnng and 30 in the end

Species are defined by recombination

Page 15: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Are our 30+ strains a representative sample?

Fully sequenced genomes: 1000s of genes (unbiased and complete) 10s of strains (biased)

MLST data: 10s of genes (biased) 1000s of strains (unbiased, I hope)

Database http://mlst.ucc.ie ∼3000 E. coli strains 7 short regions of ~500 base pairs

eachin housekeeping genes

Page 16: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.
Page 17: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

MLST-- Genomes

Page 18: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Is it really phages?

Phage capacity: 20kbOther strains up to

40kb

K-12 to B comparison

1kb: gene length

Page 19: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Does neutral model explain everything?

At 3 standard deviations

• 19 1kb regions supervariable

• 29 1kb regionssuperconserved

Page 20: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Collaborators & funding• Bill Studier (BNL)• Purushottam Dixit (BNL)• Tin Yau Pang (Stony Brook)• Rich Lenski (Michigan State)• Patrick Daegelen (France)• Jinhyun Kim (Korea) DOE Systems Biology

Knoledgebase (KBase) • Adam Arkin (Berkley) • Rick Stevens (Argonne)• Bob Cottingham (Oak Ridge)• Mark Gerstein (Yale)• Doreen Ware (Cold Spring Harbor)• Mike Schatz (Cold Spring Harbor)• Dave Weston (ORNL)• 60+ other collaborators

Page 21: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Thank you!

Page 22: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.
Page 23: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.
Page 24: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

24

Genes encoded in bacterial genomes

Packages installed on Linux computers

~

Page 25: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

25

Complex systems have many components Genes (Bacteria) Software packages (Linux OS)

Components do not work alone: they need to be assembled to work

In individual systems only a subset of components is used Genome (Bacteria) – bag of genes Computer (Linux OS) – installed

packages Components have vastly

different frequencies of use

Page 26: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

26

Justin Pollard, http://www.designboom.com

IKEA: has many components

Page 27: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

27Justin Pollard, http://www.designboom.com

They need to be assembled to work

Page 28: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

28

Different frequencies of use

vs

Common Rare

Page 29: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

29

What determines the frequency of use?

Popularity: AKA preferential attachment Frequency ~ self-amplifying popularity Relevant for social systems: WWW links,

facebook friendships, scientific citations Functional role:

Frequency ~ breadth or importance of the functional role

Relevant for biological and technological systems where selection adjusts undeserved popularity

Page 30: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

30

Empirical data on component frequencies

Bacterial genomes (eggnog.embl.de): 500 sequenced prokaryotic genomes 44,000 Orthologous Gene families

Linux packages (popcon.ubuntu.com): 200,000 Linux packages installed on 2,000,000 individual computers

Binary tables: component is either present or not in a given system

Page 31: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

31

Frequency distributions

P(f)~ f-1.5 except the top √N “universal” components with f~1

Cloud

ShellCore

ORFans

Page 32: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

32

How to quantify functional importance?

Components do not work alone Breadth/Importance ~ Component is

needed for proper functioning of other components

Dependency network A B means A depends on B for its function Formalized for Linux software packages For metabolic enzymes given by upstream-

downstream positions in pathways Frequency ~ dependency degree, Kdep

Kdep = the total number of components that directly or indirectly depend on the selected one

Page 33: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

33

Page 34: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

34Correlation coefficient ~0.4 for both Linux and genesCould be improved by using weighted dependency

degree

Frequency is positively correlated with functional importance

Page 35: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

35

Tree-like metabolic network

Kdep=5

Kdep=15

TCA cycle

Page 36: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

36

Dependency degree distribution

on a critical branching tree P(K)~K-1.5 for a critical branching tree

Paradox: Kmax-0.5 ~ 1/N Kmax=N2>N

Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N present in almost all systems (ribosomal genes

or core metabolic enzymes)

Need a new model: in a tree D=1, while in real systems D~2>1

Page 37: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

37

Dependency network evolution New components added gradually

over time New component depends on D

existing components selected randomly

Kdep(t) ~(t/N)-D

P(Kdep(t)>K)=P(t/N<K-1/D)=K-1/D

P(Kdep)=Kdep-(1+1/D) =Kdep

-1.5 for D=2 Nuniversal=N(D-1)/D=N0.5 for D=2

Page 38: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

38

Kdep decreases layer number

Linux Model with D=2

Page 39: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

39

Zipf plot for Kdep distributions

Metabolic enzymesvs

Model

Linuxvs

Model

Page 40: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

40

Frequency distributions

P(f)~ f-1.5 except the top √N “universal” components with f~1

Shell

Core

ORFans

Cloud

Page 41: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

41

Why should we care about P(f)?

Page 42: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

42

Metagenomes and pan-genomes

The Human Microbiome Project Consortium, Nature (2012)

For P(f) ~ f -1.5: (Pan-genome size)~ ~(# of samples)0.5

Page 43: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Pan-genome of E. coli strains

M Touchon et al. PLoS Genetics (2009)

Page 44: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Genome evolution in E. coliStudier FW, Daegelen P, Lenski RE, Maslov S, Kim JF J. Mol Biol. (2009)P. Dixit, T. Y. Pang, Studier FW, Maslov S, submitted (2013) 

Page 45: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

How many transcription factorsdoes an organism need?

Regulator genes Worker genes

S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)TY Pang, S. Maslov, PLoS Comp Bio (2011)

Page 46: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

+

NR~ NG2 NR/NG ~ NG

Page 47: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Parkinson's Law

The report of the Royal Commission on the Civil Service was published on Thursday afternoon. Time has not permitted any comment in this week's issue of The Economist on the contents of the Report. But the startling discovery enunciated by a correspondent in the following article is certainly relevant to what should have been in it.

Nov 19th 1955 | From The Economist print edition

“… bureaucracy grew by 5-7% per year "irrespective of any variation in the

amount of work (if any) to be done."

Why? 1) "An official wants to multiply subordinates, not rivals" 2) "Officials make work for each other.“ so that

“Work expands so as to fill the time available for its completion”

Is this what happens in bacterial genomes? Probably not!

Cyril Northcote Parkinson

(1909 -1993)

Page 48: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Economies of scale in bacterial evolution

• NR=NG2/80,000 NG/NR=80,000/NG

• Economies of scale: as genome gets larger: new pathways get shorter

Page 49: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

nutrient

Horizontal gene transfer:entire pathways could be added in one step

nutrient

Redundant enzymes are removed

Central metabolic core anabolic pathways biomass production

Page 50: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Adapted from “scope-expansion” algorithm by R. Heinrich et al.

Minimal metabolic pathwaysfrom reactions in KEGG database

(# of pathways or their regulators) ~(# of enzymes )2

NG

NR

Page 51: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

What it all means for regulatory networks?

Scale-free regulatory networks with “hubs” due to power law distribution of branch sizes: P(S)~S-3

Trends in complexity of regulation vs. genome size

NR<Kout>=NG<Kin>=number of regulatory interactions E. van Nimwegen, TIG (2003)

NR/NG= <Kin>/<Kout> increases with NG Either <Kout> decreases with NG:

functions become more specialized Or <Kin> grows with NG:

regulation gets more coordinated & interconnected

Most likely both trends at once

Page 52: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

nutrient

TF1

nutrient

TF2

Regulatory templates:one worker – one boss

<Kout>: <Kin>=1=const

Page 53: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

nutrient

TF1

nutrient

Regulatory templates:long top-to-bottom regulation

<Kout>=const<Kin>:

TF2<Kout>:<Kin> :

Page 54: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

nutrient

nutrient

TF1

TF2

Regulatory templates:hierarchy & middle management

TF3

Page 55: Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity Sergei Maslov Department of Biosciences Brookhaven National.

Histogram of the # of SNPs in genes

FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009)

Comparison of B vs K-12 strains of E. coli

• 50% of genes have very few SNPs• 1253: 0 SNPs• 445: 1 SNP• 232: 2 SNP

• The remaining 50% are in exponential tail up to 100 SNPs (10% divergence) and higher