Microbial Genomics

55
Dr M. D-S, 2007 Microbial Genomics Microbial Genomics Topics Describe the new area of genomics Outline the rapid progress in genomic sequencing Describe the analysis of sequences - bioinformatics Show the use of genomics in the study of microbes Use the sequence of a human pathogen Escherichia coli O157:H7 to illustrate the above points Ref: Perna et al. (2001) Nature 409:529 (USA) Relevant to next lectures.

description

Microbial Genomics. Topics Describe the new area of genomics Outline the rapid progress in genomic sequencing Describe the analysis of sequences - bioinformatics Show the use of genomics in the study of microbes - PowerPoint PPT Presentation

Transcript of Microbial Genomics

Dr M. D-S, 2007

Microbial GenomicsMicrobial GenomicsMicrobial GenomicsMicrobial Genomics

Topics Describe the new area of genomics Outline the rapid progress in genomic sequencing Describe the analysis of sequences - bioinformatics Show the use of genomics in the study of microbes Use the sequence of a human pathogen Escherichia

coli O157:H7 to illustrate the above points Ref: Perna et al. (2001) Nature 409:529 (USA)

Relevant to next lectures.

Dr M. D-S, 2007

Microbial genome sequencesMicrobial genome sequencesMicrobial genome sequencesMicrobial genome sequences

Genbank (NCBI), Bethesda, Maryland, USA 2007: 481 - completed microbial genomes 2006: 3192003: 112

Sizes range from 0.58 - over 9 Mb

Genbank - main genomic database

There is some duplication...

Dr M. D-S, 2007

GenomicsGenomicsGenomicsGenomics- the study of entire genomes of organisms

assumes the entire sequence of at least one representative example has been determined

includes study of all the genes and gene products and non-coding regions

includes study of genome organisation and evolution

- the study of entire genomes of organismsassumes the entire sequence of at least one

representative example has been determined includes study of all the genes and gene

products and non-coding regions includes study of genome organisation and

evolution

Dr M. D-S, 2007

The The explosionexplosion of of ‘‘-ome’-ome’ and ‘ and ‘-omics’-omics’ words words

The The explosionexplosion of of ‘‘-ome’-ome’ and ‘ and ‘-omics’-omics’ words words

Functional genomicsProteomeTranscriptomeMetabolome, Glycome, Lipidome

e.g. a recent journal article with the title: “Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis”

Functional genomicsProteomeTranscriptomeMetabolome, Glycome, Lipidome

e.g. a recent journal article with the title: “Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis”

Dr M. D-S, 2007

What can microbial genomics tell us ?• Full gene complement of the cell

• Complete description of cell metabolism

• How genomes are structured

• Virulence genes

• Potential drug targets

• Gene flow between cells (evolution)

GenomicsGenomicsGenomicsGenomics

Dr M. D-S, 2007

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

1. Sanger di-deoxy sequencing (using fluorescently labelled ddNTPs) on cloned DNA templates.

2. Pyro-sequencing method on 454 machine using uncloned DNA templates

1. Sanger di-deoxy sequencing (using fluorescently labelled ddNTPs) on cloned DNA templates.

2. Pyro-sequencing method on 454 machine using uncloned DNA templates

Dr M. D-S, 2007

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

Dye-terminator chemistry, ABI sequencing apparatus, commercial software for handling seq. data

1. Sanger di-deoxy sequencing (using fluorescently labelled ddNTPs) on cloned DNA templates. ‘Shotgun’ strategy.

Dr M. D-S, 2007

Genomic sequencing Genomic sequencing methodsmethods

Genomic sequencing Genomic sequencing methodsmethods

Shear DNA & isolate Shear DNA & isolate fragments about 2kbfragments about 2kb

chDNA

Clone thousands of Clone thousands of fragments into fragments into plasmid vector plasmid vector (library). Prepare (library). Prepare DNA for DNA for sequencingsequencing

Dr M. D-S, 2007

Dideoxy chain terminationDideoxy chain terminationDideoxy chain terminationDideoxy chain termination

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://www.plattsburgh.edu/acadvp/artsci/biology/bio401/DNASeq.html

Dr M. D-S, 2007

Sequence: methods sectionSequence: methods sectionSequence: methods sectionSequence: methods sectionApplied Biosystems Inc (ABI) latest

sequencing machine, PE 3700

Capillary electrophoresis

96 capillaries at a time

Robotically loaded and run (24hr)

How many bp can it do in a day??

- each run is 2hr, get 600-1000 nt per capillary, 96 capillaries/run

+

-

Dr M. D-S, 2007

Applied Biosystems Inc (ABI) latest sequencing machine, PE 3700

How many bp can it do in a day??

- each run is 2hr, about 800bp each lane, 96 lanes

= 24/2 800 96 = 921,000

Or about 1 Mb /machine/day

Sequence: methods sectionSequence: methods sectionSequence: methods sectionSequence: methods section

Dr M. D-S, 2007

Sequence dataSequence dataSequence dataSequence dataLaser scanning of the 96 Laser scanning of the 96 capillary tubes identifies the capillary tubes identifies the colour and positions of the colour and positions of the closely spaced bands of closely spaced bands of ssDNA.ssDNA.

Top of capillary tubes

TAATCATGGTC....TAATCATGGTC....

+

-

Dr M. D-S, 2007

~ 1 Mb /machine/day

Shotgun sequencing: how much do Shotgun sequencing: how much do you need to do?you need to do?

Shotgun sequencing: how much do Shotgun sequencing: how much do you need to do?you need to do?

Want both strands, good sequence for both, random coverage means you will need 6-8x genome size in sequence data

Speed makes it efficient?

Counter argument is the difficulty in linking up reads, particularly when genomes have long repeat sequences.

Dr M. D-S, 2007

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

In the E.coli O157:H7 genome sequence paper by Perna et al., there were 2 gaps remaining in the genome sequence! They couldn’t complete it.

“Extended exact matches pose a significant assembly problem.” ??

Dr M. D-S, 2007

Repeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomes

Nearly identical prophage sequences at 3 locations on genome, all > 2000 nt

Nearly identical prophage sequences at 3 locations on genome, all > 2000 nt

What sequences do you observe

when inside a prophage genome?

What sequences do you observe

when inside a prophage genome?

Dr M. D-S, 2007

Repeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomes

Nearly identical prophage sequences at 2 locations on genome

Nearly identical prophage sequences at 2 locations on genome

What sequences do you see going

across the borders of prophages?

What sequences do you see going

across the borders of prophages?

Dr M. D-S, 2007

Repeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomesRepeat sequences, eg. Prophage genomes

Nearly identical prophage sequences at 2 locations on genome

Nearly identical prophage sequences at 2 locations on genome

What information do you need to

place the repeats properly?

What information do you need to

place the repeats properly?

Dr M. D-S, 2007

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

Genome Sequencing:Genome Sequencing:TwoTwo methodsmethods

1. Sanger di-deoxy sequencing (using fluorescently labelled ddNTPs) on cloned DNA templates.

2. Pyro-sequencing method on 454 machine using uncloned DNA templates

1. Sanger di-deoxy sequencing (using fluorescently labelled ddNTPs) on cloned DNA templates.

2. Pyro-sequencing method on 454 machine using uncloned DNA templates

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

The 454 machines: the next revolution

www.454.com

The 454 machines: the next revolution

www.454.com

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

40 million bases/5.5 hr

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

The 454 machines: the next revolution

www.454.com

40 million bases/5.5 hr

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

DNA immobilised on micro-beads

Positioned in wells of special tray (44um diameter, 1.2 million per chip)

Sequencing enzymes on smaller beads.

Only one DNA-bead can fit in each well

Each bead has only one DNA fragment attached, so will give unique sequence.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

The 454 machines: the next revolution

www.454.com

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When a base is incorporated (by DNA polymerase), light is emitted, and the light detected under each well.

The 454 machines: the next revolution

www.454.com

40 million bases/5.5 hr

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

When a base is incorporated (by DNA polymerase), light is emitted, and the light detected under each well. If there are multiple bases, the light is proportional to the number. Chain lengths of 200 nt are possible. With 200,000 wells, and 200nt/well, then 40 million bases can be sequenced.

Dr M. D-S, 2007

Papers filled with JARGON. Mainly genetic terms. Some terms are relatively new (eg. replichore)

Use the E.coli paper example, stopping to investigate each new term or concept

Emphasise the uses of this data, and the future of genomic research.

GenomicsGenomicsGenomicsGenomics

Dr M. D-S, 2007

What do you know about What do you know about microbial genomes ?microbial genomes ?

What do you know about What do you know about microbial genomes ?microbial genomes ?

Exercise: Think of a typical bacterial genome, like that of E.coli and -Sketch the genome and the most significant

features you know about it (as a whole genome, not individual genes)

Jot down what you think the main selective pressures are on it

Exercise: Think of a typical bacterial genome, like that of E.coli and -Sketch the genome and the most significant

features you know about it (as a whole genome, not individual genes)

Jot down what you think the main selective pressures are on it

Dr M. D-S, 2007

Escherichia coliEscherichia coli genome genomeEscherichia coliEscherichia coli genome genome

Circular, ~ 4.6 MbOri and Ter, bidirectional replicationReplichores about equal

Circular, ~ 4.6 MbOri and Ter, bidirectional replicationReplichores about equal

ter

oriC

Dr M. D-S, 2007

Replichore ‘balance’Replichore ‘balance’ ? ?Replichore ‘balance’Replichore ‘balance’ ? ? If you move oriC relative to Ter, the

growth rate of E. coli K-12 is reduced.Chromosomal inversions around the origin

or termination of replication are usually symmetrical, conserving the replichore balance.

Hill, C. W., and J. A. Gray. 1988. Effects of chromosomal inversion on cell fitness in Escherichia coli K-12. Genetics 119:771–778.

Eisen, J. A., J. F. Heidelberg, O. White, and S. L. Salszberg. 2000. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 1:0011.1–0011.9

If you move oriC relative to Ter, the growth rate of E. coli K-12 is reduced.

Chromosomal inversions around the origin or termination of replication are usually symmetrical, conserving the replichore balance.

Hill, C. W., and J. A. Gray. 1988. Effects of chromosomal inversion on cell fitness in Escherichia coli K-12. Genetics 119:771–778.

Eisen, J. A., J. F. Heidelberg, O. White, and S. L. Salszberg. 2000. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 1:0011.1–0011.9

Dr M. D-S, 2007

E.coliE.coli genome - global features genome - global featuresE.coliE.coli genome - global features genome - global features

Gene dosageGene direction relative to oriRecombination/inversion rates vary

around chromosome

Gene dosageGene direction relative to oriRecombination/inversion rates vary

around chromosome

Dr M. D-S, 2007

Gene DosageGene DosageGene DosageGene Dosage Genes near the origin of replication will

almost always be in multiple copy compared to genes near the terminus

So the position of a gene relative to the origin will affect its expression, and the regulatory systems would have evolved to accommodate for the gene dosage effect.

So what would happenif you moved genes ?

Genes near the origin of replication will almost always be in multiple copy compared to genes near the terminus

So the position of a gene relative to the origin will affect its expression, and the regulatory systems would have evolved to accommodate for the gene dosage effect.

So what would happenif you moved genes ?

ter

oriC

Dr M. D-S, 2007

Gene DirectionGene DirectionGene DirectionGene Direction

What happens when a DNA pol meets an RNA pol going in the opposite direction?

What happens when a DNA pol meets an RNA pol going in the opposite direction?

DNAPolymerase

RNAPolymerase

Dr M. D-S, 2007

Gene DirectionGene DirectionGene DirectionGene Direction

What happens when a DNA pol meets an RNA pol going in the opposite direction?

What happens when a DNA pol meets an RNA pol going in the opposite direction?

This is better….DNA

Polymerase

RNAPolymerase

Dr M. D-S, 2007

Gene DirectionGene DirectionGene DirectionGene Directionori

A preference for genes to be on ONE strand of the replichore, so that the direction of transcription and replication are the same.

This bias may have other implications.

Dr M. D-S, 2007

Recombination/inversionsRecombination/inversionsRecombination/inversionsRecombination/inversions

Genomes often have large repeated sequences, eg. ribosomal RNA gene clusters (16S-23S-5S), or phage genomes.

Such repeats allow large inversions of DNA segments or recombination between chromosomes

Genomes often have large repeated sequences, eg. ribosomal RNA gene clusters (16S-23S-5S), or phage genomes.

Such repeats allow large inversions of DNA segments or recombination between chromosomes

Dr M. D-S, 2007

Inversion via repeated Inversion via repeated sequencessequences

Inversion via repeated Inversion via repeated sequencessequences

Homologous recombination between rRNA genes

Homologous recombination between rRNA genes

Dr M. D-S, 2007

origin

terminus

GC-skew

Chi sequences

Dr M. D-S, 2007

Systematic bias in base composition of one strand as you go around the genome

GC skew

genome

originorigin

terterterter

[G-C]

[G+C]

Genomics: What is GC-skew ?Genomics: What is GC-skew ?Genomics: What is GC-skew ?Genomics: What is GC-skew ?

Dr M. D-S, 2007

GC-skew of genomesGC-skew of genomesGC-skew of genomesGC-skew of genomes

Dr M. D-S, 2007

GC-skew of genomic DNA

Compositional bias:

Leading strand enriched in G/T (keto)

Lagging strand enriched in C/A (amino)

WHY?

Perhaps due to deamination of exposed C’s in the leading strand, producing C>T mutations. Theory only.

Dr M. D-S, 2007

origin

terminus

GC-skew

Chi sequences

Dr M. D-S, 2007

Chi sequencesGCTGGTGG

Sequence recognised (and cut) by the RecBC enzyme

Promotes homologous recombination (by RecA)

E.coliE.coli O157:H7O157:H7 -- K12K12 genome genome comparison:comparison:

E.coliE.coli O157:H7O157:H7 -- K12K12 genome genome comparison:comparison:

Dr M. D-S, 2007

Lateral Gene Transfer (LGT)Lateral Gene Transfer (LGT)Lateral Gene Transfer (LGT)Lateral Gene Transfer (LGT) Literally, the natural transfer of genetic material

between different organisms (species, genera, etc) Doesn’t say how the DNA was transferred or

integrated, or where it came from. Does imply that the DNA can be identified as

‘foreign’ Since DNA doesn’t have a ‘made in X’ sticker,

how can the ‘foreignness’ be identified? …. Ideas?….

Dr M. D-S, 2007

Lateral Gene TransferLateral Gene Transfer (LGT) (LGT)Lateral Gene TransferLateral Gene Transfer (LGT) (LGT)Known mechanisms of DNA transfer between

bacteria:- Transduction

transducing bacteriophages introduce host DNA, and this recombines with the genome

Transformation DNA uptake from the surroundings, and

recombination. Conjugation

natural transfer method, sex pilus, one-way transfer, recombination.

+ -

Dr M. D-S, 2007

ProphageProphageProphageProphage

The host cell, with a prophage, is called a lysogen. Some prophages express virulence determinants,

such as toxins ( = lysogenic conversion). eg. Shiga toxin

Some prophages exist as plasmids, but most integrate into the genome.

If the prophage becomes damaged…. ?

Bacteriophages that are temperate (as compared to Bacteriophages that are temperate (as compared to lytic) can exist inside host cells in a stable and lytic) can exist inside host cells in a stable and relatively inactive state as prophages.relatively inactive state as prophages.

Dr M. D-S, 2007

E.coliE.coli genome sequences genome sequencesE.coliE.coli genome sequences genome sequencesSTRAIN SIZE DATEE.coli K12 4639221 bp, Oct 13 1998

E.coli O157:H7 (USA)5528970 bp, Jan 25, 2001

E.coli O157:H7 (Japanese) 5498450 bp, Mar 7, 2001

*about 4.1Mb in common

Data from NCBI:

http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/eub.html

Dr M. D-S, 2007

E.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparisonE.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparison

Unexpected complex segmented relationship Share a common 4.1 Mb ‘backbone’ or common,

and generally colinear sequence (only 1 inversion) Homologous sequences are interspersed with

HUNDREDS of ISLANDS of INTROGRESSED DNA

A B C D

A B C D

A B X C D

A B C D

Dr M. D-S, 2007

The specific DNA segments for each strain were named ‘O islands’ , ie O157:H7-specific DNA segments, or ‘K islands’

Backbone of 4.1 Mb common sequence. Not identical (eg 75% of proteins differ by one aa).

O-islands total 1.34 Mb (about 26% of genes !) Largest O-island is 106 gene region (not small!)

E.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparisonE.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparison

Dr M. D-S, 2007

Virulence genes do not seem to be concentrated in one particular ‘island’; appear to be several

Often (189 cases), the backbone-island junction is WITHIN an ORF.

Protein coding ORF

AUG UGAO-islandO-island

What does this pattern suggest?What does this pattern suggest?

E.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparisonE.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparison

Dr M. D-S, 2007

Suggests that incoming DNA recombined with the genome (somehow?) rather than inserted.

E.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparisonE.coliE.coli O157:H7 - K12 O157:H7 - K12 genome comparisongenome comparison

Protein coding ORF

AUG UGAO-islandO-island

Dr M. D-S, 2007

Comparative Genome MapComparative Genome Map

Dr M. D-S, 2007

Distribution of O-islands of EDL933 specific sequence (red), ‘K-islands’ of K12 specific sequence (green) and common ‘backbone’ sequence (blue)

GC-content of genes, plotted around mean

GC-skew for 3rd codons

Scale, in base pairs

Octamer Chi sequences

Genome Map

Dr M. D-S, 2007

Genome sequence - Figure 2Genome sequence - Figure 2Genome sequence - Figure 2Genome sequence - Figure 2O-specificO-specific ‘islands’

K-specificK-specific ‘islands’

Scale (10kb/tick)

O157:H7 genes and their orientation

Dr M. D-S, 2007

Genome sequence - Figure Genome sequence - Figure 22

Genome sequence - Figure Genome sequence - Figure 22

CP-933 = Cryptic Prophage. Also an O island

How many kb is this phage genome?

Dr M. D-S, 2007

Summary of main findings:

1. Many insertions of DNA around chromosome

2. Inserted DNA is foreign (HGT or Lateral GT)

3. Several virulence gene clusters; widely spread

4. Prophage genomes prominent

5. Systematic variations base composition- coding strand, GC skew, chi seqs

E.coliE.coli O157:H7O157:H7 genome sequence genome sequenceE.coliE.coli O157:H7O157:H7 genome sequence genome sequence

Dr M. D-S, 2007

Summary of main findings:

6. E.coli O157:H7 undergoes relatively high rates of recombination and mutation.

- where is the DNA coming from ? unknown, phage, mobile elements (eg. transposons)

- what is the main method of transfer ?

- is defective DNA mismatch repair important ?

E.coliE.coli O157:H7O157:H7 genome sequence genome sequenceE.coliE.coli O157:H7O157:H7 genome sequence genome sequence

Dr M. D-S, 2007

Summary of main findings:These large differences can be exploited: Diagnostic tools (discriminate b/n E.coli strains) New virulence gene candidates can be tested for

function, and new drugs developed Effects of antibiotics on toxin synthesis examined

Note in the genome sequences of many microbes, the percentage of ORFs that cannot be identified is often > 20%

E.coliE.coli O157:H7O157:H7 genome sequence genome sequenceE.coliE.coli O157:H7O157:H7 genome sequence genome sequence