Identifying conserved segments in rearranged and divergent genomes
description
Transcript of Identifying conserved segments in rearranged and divergent genomes
![Page 1: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/1.jpg)
Identifying conserved segments in rearranged and divergent
genomes
Bob Mau, Aaron Darling, Nicole T. Perna
Presented by Aaron Darling
![Page 2: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/2.jpg)
Comparing genomic architectures
Genome sequence and architecture comparison can lead to insight about organismal
• Evolutionary forces• Gene functions• Phenotypes
Rearrangement, gene gain, loss, and duplication obfuscate homology
![Page 3: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/3.jpg)
Structure of the bacterial chromosome
Origin of replication
Terminus
Replication proceeds simultaneously on each “replichore”
Breakpoints of inversions occur an equal distance from the origin to maintain replichore balance.
(Tillier and Collins 2000, Ajana et. al. 2002)
We call such rearrangements “symmetric inversions”
Replichore size difference > 20% is selected against (Guijo et. al. 2001)
![Page 4: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/4.jpg)
A dot plot: Each dot is a pairwise (or n-way) local alignment
![Page 5: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/5.jpg)
Goal: Identify local homologous (orthologous) segments
Blue:
Same strand
Red:
Opposite strand
![Page 6: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/6.jpg)
Tools for segmental homology detection
GRIMM-Synteny (Pevzner et. al. 2003, Bourque et. al. 2004)
- cluster markers within a fixed distance
FISH (Vision et. al. 2003)- find statistically over-represented
clusters of markers within a fixed distance
LineUp (Hampson et. al. 2003)- find collinear runs of markers among
pairs of genomes, allowing degeneracy
Some alignment tools:Shuffle-LAGAN (Brudno et. al. 2003), Mauve (Darling et. al. 2004)
![Page 7: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/7.jpg)
![Page 8: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/8.jpg)
![Page 9: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/9.jpg)
Small segments separated by lineage-specific regions may not be detected by methods based strictly on distance.
Key idea: use a combination of conserved marker order (collinearity) and alignment score
![Page 10: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/10.jpg)
Finding conserved regions: A pseudo-Gibbs sampler method
Given: A set of M monotypic markers MDo: Assign a posterior probability that any marker m є
M is part of a conserved region
Use MCMC methodology to sample the frequency of
each marker’s inclusion in high-scoring configurations.
Use frequency as an estimate of “posterior probability”
![Page 11: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/11.jpg)
Finding conserved regions: A pseudo-Gibbs sampler method
Define a configuration X as a vector of length M ofbinary random variables:
e.g. X = ( X1, X2, …, XM )
A configuration value xj maps marker mj to either signal (1) or noise (0)
e.g. x = (0,1,0,0,1,1,…,1,0)There are 2M possible configurationsRun a Markov chain of length N over configuration space: (X1, X2, …, XN)
![Page 12: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/12.jpg)
Sample possible marker configurations
Start with a random initial configuration, THEN:
Select a marker, sample whether it should be a 0 or 1 based on the current configuration
R
jvvvj
j
Lvvvj xwwxwmScore
1
1
)|( x
Sum of scores for all collinear markers to the left
Sum of scores for all collinear markers to the right
Score of marker j
wv is the score of marker v, xv is the configuration value (0 or 1)
![Page 13: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/13.jpg)
Transform LCB score to probability
The scale parameter c is used in tandem with the sigmoid to map a marker’s score to a probability:
1
1)|1( /)(
/)(1
cmScore
cmScorenn
j j
j
e
eXP x
![Page 14: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/14.jpg)
Sample a new value for xj
Set xj to 1 with probability given by the marker’s
score transformation
First allow the chain a “burn-in” period, then
continue for many iterations.
The frequency, or “posterior probability” of mj is:
samples of #
1 samples of #
![Page 15: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/15.jpg)
Our method assigns each marker a p.p.
Threshold γ separates signal from noise
![Page 16: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/16.jpg)
Our method assigns each marker a p.p.
Using γ = .5, the X pattern appears
![Page 17: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/17.jpg)
Our method assigns each marker a p.p.
Using γ = .5, the X pattern appears
![Page 18: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/18.jpg)
Application to 4 divergent Streptococcus
Markers are reciprocal best blastp hits of ORFs among:
S. agalactiae
S. pyogenes
S. pneumoniae
S. mutans
S. pneumoniae
![Page 19: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/19.jpg)
What is the distribution of segment sizes in Streptococci?As resolution increases, large segments are broken up by
smaller segments
3
11
29
7 72
61 3 1 2 1 0 0 2
0
5
10
15
20
25
30
35
2 3 4 5 6 7 8 9 10 11 13 14 17 18 24
Nu
mb
er o
f L
CB
s
Segment sizes (Markers per segment)
c = 75, γ = .45
“Low resolution”
c = 30, γ = .45
“Medium resolution”
c = 20, γ = .50
“High-1 resolution”
c = 20, γ = .30
“High-2 resolution”
14
20
7 72
62 3 1 2 0 0 1 2
0
5
10
15
20
25
30
35
2 3 4 5 6 7 8 9 10 11 13 14 17 18 24
Nu
mb
er o
f L
CB
s
0 0 2 4 62
61
41 1 0 2 1 2
0
5
10
15
20
25
30
35
2 3 4 5 6 7 8 9 10 11 13 14 17 18 24
Nu
mb
er o
f L
CB
s
0 0 0 1 2 16
25 3 1 0 2 1 2
0
5
10
15
20
25
30
35
2 3 4 5 6 7 8 9 10 11 13 14 17 18 24
Nu
mb
er o
f L
CB
s
26
32
57
72
Total Segments
![Page 20: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/20.jpg)
What was the ancestral genome organization?
Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments
![Page 21: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/21.jpg)
What was the ancestral genome organization?
Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments
Failed: The suggested rearrangements do not maintain replichore balance
![Page 22: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/22.jpg)
What was the ancestral genome organization?
Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments
Failed: The suggested rearrangements do not maintain replichore balance
Try using the 26 larger, low resolution segments
Surprise! A success:
![Page 23: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/23.jpg)
Transforming S. agalactiae into S. pyogenes
![Page 24: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/24.jpg)
Conclusions
- The pseudo-Gibbs sampler method detects
collinear segments at a variety of scales
- It would be nice to have an inversion phylogeny
inference tool that accounts for replichore balance!
- Large segments in Streptococci appear to
rearrange by symmetric inversions
- Small segments? An open problem.
![Page 25: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/25.jpg)
Future directions
Can a biologically relevant full joint probability distribution be expressed over configurations?
- If so, then a true Gibbs sampler could be employed
Problems:- Some rearrangements occur with different
frequency (e.g. symmetric inversions about the terminus vs. IS-mediated translocation)
- Distinguish rearrangement by H.T., gene duplication and subsequent loss, symmetric inversion, etc.
![Page 26: Identifying conserved segments in rearranged and divergent genomes](https://reader036.fdocuments.in/reader036/viewer/2022062321/56813b7d550346895da499ac/html5/thumbnails/26.jpg)
Acknowledgements
Bob Mau – did most of this workMy Ph.D. advisers:
Nicole T. Perna and Mark Craven
Others who have contributed insight:Jeremy Glasner, Fred Blattner, Eric CabotGEL@UW-Madison
Grant $. Money : NIH Grant GM62994-02. NLM Training Grant 5T15M007359-03 to A.E.D.