Genome Rearrangement By Ghada Badr Part I.
-
Upload
christian-mason -
Category
Documents
-
view
235 -
download
0
description
Transcript of Genome Rearrangement By Ghada Badr Part I.
Genome Rearrangement By Ghada Badr Part I Genome, chromosome,
gene, gene order
The entire complement of genetic material carried by an individual
is called the genome. Each genome contains one or more DNA
molecules, one per chromosome Genome, chromosome, gene, gene
order
A gene is a segment of DNA sequence with a specific function
Genome, chromosome, gene, gene order
A C D F 5 3 3 5 B E Gene order:A -B C D EF Genes can be ordered by
their DNA sequence location. DNA consists of two complementary
strands twisted around each other to form a right-handed double
helix. A sign (+/-) is usually used to indicate on which strand a
gene is located. Genome, chromosome, gene, gene order
A B C D E F H I K J The DNA molecule (chromosome) may be circular
or linear Genome Rearrangement A -B C D -E F B -E F -D A C
The genome is structurally specific to each species, and it changes
only slowly over time. Therefore genome comparison among different
species can provide us with much evidence about evolution. Genome
rearrangements are an important aspect of the evolution of species.
Even when the gene content of two genomes is almost identical, gene
order can be quite different. A -B C D -E F Genome 1 B -E F -D A C
Genome 2 Genome Rearrangement Gene order analysis on a set of
organisms is a powerful technique for genomic comparison
phylogenetic inference. Genome Rearrangement General Definition for
the problem:
Given a set of genomes and a set of possible evolutionary events
(operations), find a shortest set of events transforming (sorting)
those genomes into one another. What genome means and what events
are, makes the diversity of the problem. Since these events are
rare, scenarios minimizing their number are more likely close to
reality. Many models have been proposed. Genome Models Genes (or
blocks of contiguous genes) are a good example of homologous
markers, segments of genomes, that can be found in several species.
The simplest possible model is: The order of genes in each genome
is known, All the genomes share the same set of genes, All genomes
contain a single copy of each gene, and All genomes consist of a
single chromosome. Genome Models Genomes can be modeled by each
gene can be assigned a unique number and is exactly found once in
the genome. permutations: Signed Permutation: Each gene may be
assigned + or - sign to indicate the strand it resides on. Unsigned
Permutation: If the corresponding strand is unknown. Permutaions
Genes (markers) are represented by integers:
with +,- sign to indicate the strand they lie on. The order and
orientation of genes of one genome in relation to the other is
represented by a signed permutation . = ( 2 n-1 n) of size n over
{-n, ... , -1, 1, ... , n}, such that for each i from 1 to n,
either i or -i is mandatory represented, but not both. Permutaions
Identity permutation:
The identity permutation n = (1, 2, 3, , n). When multiple genomes
with the same gene content are compared, one of them is chosen as a
base (reference), i.e, represented as n, and all other identical
genes are given the same integer values. Permutaions
Sorted/unsorted permutation:
In order to sort a permutation this means that we want to apply
some operations on to change it to n. If (1 = 2) We say that is
sorted with respect to . If (1 2) We say that is unsorted with
respect to . Permutaions Example: Mitochondrial Genomes of 6
Arthropoda
1= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Fruit Fly Mosquito Silkworm Locust Tick Centipede 2= (1 , 2 ,3 ,
4,5,6,8,7,9,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 ,3 ,
4,5,6,7,8,9, 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 ,3 ,
5,4,6,7,8,9, 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1 , 3 ,
4,5,6,7,8,9, 10 ,11 ,-2 , 12 , 13, 14 , 15 , 16 , 17) 6= (1 , 3 ,
4,5,6,7,8,9, 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)
Permutaions Example: Mitochondrial Genomes of 6 Arthropoda
1= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Fruit Fly Mosquito Silkworm Locust Tick Centipede 2= (1 , 2 ,3 ,
4,5,6,8,7,9,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 3= (1 , 2 ,3 ,
4,5,6,7,8,9, 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17) 4= (1 , 2 ,3 ,
5,4,6,7,8,9, 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17) 5= (1, 3 ,
4,5,6,7,8,9, 10 ,11 ,-2 , 12 , 13, 14 , 15 , 16 , 17) 6= (1 , 3 ,
4,5,6,7,8,9, 10 , 11 ,-2 , 12 , 16 , 13 , 14 , 15 , 17) Permutaions
Linear and circular permutation:
is linear when it represents a linear chromosome, or circular when
it represents a circular chromosome. When = ( 2 n-1 n) is circular:
= (-n n-1 2 1) all permutations obtained by shifts on or shift( ,
i) = (n-i+1 n-i+2n-1 n1 n-i are all equivalent. Example:
(-3,2,1,-4) & (-1,-2,3,4) Permutaions Points in
permutations
For a given permutation = ( 2 n-1 n), there is a point between each
pair of consecutive values i and i+1 in . If is linear: there are
two additional points, one before and one after n. If is circular:
there is one additional point between nand 1. Pts() = n+1 if
linear, and pts() = n if circular. Permutaions Linear extension of
a permutation:
For a given = ( 2 n-1 n) If is linear: a linear extension of is =
(0, 2 n-1 n, n+1) If is circular: a linear extension of is = (0, 2
n-1 n-1, n) Permutaions Example: = (4,8,9,7,6,5,1,3,2)
= (0,4,8,9,7,6,5,1,3,2,10) = ( ) Then Pts() = 10 Now: we want to
compare our genomes. Permutations - similarity/distance
Problem: Given two genomes, How do we measure their similarity
and/or distance? A Related Problem: Given two permutations, How do
we measure their similarity and/or distance? Permutations -
similarity/distance
A distance measure should be a metric on the set of genomes. A
Metric d on a set S (d: S S R) satisfies the following three
axioms: Positivity: for all s, t in S, d(s,t) 0, and d(s,t)=0 iff s
= t. Symmetry: for all s, t in S, d(s,t) = d(t,s). Triangular
inequality: for all s, t, u in S, d(s,u) d(s,t) + d(t,u).
Permutations - similarity/distance
Measures of similarity between permutations that are used in
computational biology are numerous in literature. First measures
used are (will be useful later on): Breakpoints (Introduced by
Sankoff and Blanchette (1997)) Common intervals
Permutations-distance - Breakpoints
When analyze with respect to , each point in can be an adjacency or
a breakpoint. A point (pair of consecutive values) (i, i+1) in is
an adjacency between and : when either (i, i+1) or (-I+1, -i) are
consecutive in . If is linear: we have adjacency before if is also
the first value in , and an adjacency after n, if n is also last
value in . If is circular: we assume that n is also last value in
and (n, 1) is an adjacency if is also the first value in .
Permutations-distance - Breakpoints
brp() = pts() - adj() where: pts() is the number of points in .
adj() is the number of adjacencies. If is sorted ( = ): has only
adjacencies and no breakpoints (brp() = 0). If is unsorted ( ): has
at least onebreakpoint (brp() 0). Breakpoint distance counts the
lost adjacencies between genomes. The breakpoint distance between
and is: Permutations-distance - Breakpoints
Back to our Example: = (4,8,9,7,6,5,1,3,2) =
(0,4,8,9,7,6,5,1,3,2,10) = ( ) Then Pts() = 10, brp()? Adjacencies?
n= ( ) (8,9) (7,6) (6,5) (3,2) adj() = 4 brp() = pts() - adj() = =
6 Permutations-distance - Breakpoints
Breakpoint distance is based on the notion of conserved adjacencies
and can be defined on a set of more than two genomes. It is easy to
compute. It always fails to capture more global relations between
genomes. The first generalization of adjacencies is the notion of
common intervals. Permutations-distance - Common Intervals
Common intervals: subsets of genes that appear consecutively
together in two or more genomes, where genes are the same in each
interval but may be not in the same order or orientation. Example
(circular chromosomes) 1= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 2= (1 , 2 ,3 , 4,5,6,8,7,9,-10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 3= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 ,
14 , 13 , 15 , 16 , 17) 4= (1 , 2 ,3 , 5,4,6,7,8,9, 10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 5= (1 , 3 ,4 , 5,6,7,8,9, 10 , 11 ,-2 , 12
, 13 , 14 , 15 , 16 , 17) 6= (1 , 3 ,4 , 5,6,7,8,9, 10 , 11 ,-2 ,
12 , 16 , 13 , 14 , 15 , 17) If compare the first 4 species: they
share 6 adjacencies {1,2}, {2,3},{11.12},{15,16},{16,17},{17,1} If
compare all 6 species: they share only 1 adjacency {17,1}
Permutations-distance - Common Intervals
Common intervals: subsets of genes that appear consecutively
together in two or more genomes, where genes are the same in each
interval but may be not in the same order or orientation. Example
(circular chromosomes) 1= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 2= (1 , 2 ,3 , 4,5,6,8,7,9,-10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 3= (1 , 2 ,3 , 4,5,6,7,8,9, 10 , 11 , 12 ,
14 , 13 , 15 , 16 , 17) 4= (1 , 2 ,3 , 5,4,6,7,8,9, 10 , 11 , 12 ,
13 , 14 , 15 , 16 , 17) 5= (1 , 3 ,4 , 5,6,7,8,9, 10 , 11 ,-2 , 12
, 13 , 14 , 15 , 16 , 17) 6= (1 , 3 ,4 , 5,6,7,8,9, 10 , 11 ,-2 ,
12 , 16 , 13 , 14 , 15 , 17) The six permutations are very similar.
The genes in the interval [1,12] are all the same, as genes in the
intervals [3,6], [6,9],[9,11], and [12,17]. Permutations-distance -
Common Intervals
We can use common intervals as a measure of similarity between
species. Disadvantage: All these measures do not reflect
rearrangement operations or explain what happened to the genome
over time. Rearrangement operations (events)
Back to our original problem: Given a set of genomes and a set of
possible evolutionary events (operations), find a shortest set of
events transforming those genomes into one another. What are the
Rearrangement events (Operation)? These events (Operation) could be
applied to a single gene or to a group of genes, intervals.
Rearrangement operations
Example: Mitochondrial Genomesof 6 Arthropoda Fruit Fly Mosquito
Silkworm Locust Tick Centipede Rearrangement Operations
Rearrangement operations affect gene order and gene content. There
are various types: In case of single-chromosome genome: Inversions
Transpositions Reverse transpositions Gene Duplications Gene loss
In case of multiple-chromosomes genomes we add: Translocations
fusions fissions Rearrangement Operations - Single Chro.
Inversion Rearrangement Operations - Single Chro.
Inversion Rearrangement Operations - Single Chro.
Inversion Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomesof 6 Arthropoda An inversion. Fruit
Fly Mosquito Silkworm Locust Tick Centipede Rearrangement
Operations - Single Chro.
Transposition Rearrangement Operations - Single Chro.
Transposition Rearrangement Operations - Single Chro.
Transposition Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomesof 6 Arthropoda Fruit Fly Mosquito
Silkworm Locust Tick Centipede A transposition Rearrangement
Operations - Single Chro.
Reverse Transposition Rearrangement Operations - Single Chro.
Reverse Transposition Rearrangement Operations - Single Chro.
Reverse Transposition Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomesof 6 Arthropoda Fruit Fly Mosquito
Silkworm Locust Tick Centipede A reverse transposition
Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Translocation Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
Fusion Fission Rearrangement Operations - Multiple Chro.
From 24 chromosomes To 21 chromosomes [Source: Linda Ashworth,
LLNL] DOE Human Genome Program Report Rearrangement Problems
Back to our original problem: Given a set of genomes and a set of
possible evolutionary events (operations), find a shortest set of
events transforming those genomes into one another. Any set of
operations yields a distance between genomes, by counting the
minimum number of operations needed to transform one genome into
the other. Rearrangement Problems
Back to our original problem: Given a set of genomes and a set of
possible evolutionary events (operations), find a shortest set of
events transforming those genomes into one another. Two classical
problems Computing the distance d() Computing one optimal sorting
sequence of events. Reversal Distance - Sorting by Reversals
Given a permutation , calculate reversal distance d() and find one
optimal sequence of reversals sorting . Assumption: Only reversals
are allowed. No duplication in genes. Genomes are unichromosomal.
Reversal Distance - Sorting by Reversals
A reversal is represented as a set of genes appearing together in
the given genome. Reversal Distance - Sorting by Reversals Reversal
Distance - Sorting by Reversals Reversal Distance - Sorting by
Reversals Reversal Distance - Sorting by Reversals Reversal
Distance - Sorting by Reversals Reversal Distance - Sorting by
Reversals
This approach is symmetric Reversal Distance - Sorting by
Reversals
Reversal graph for n = 3 Vertices: all permutations of n = 3.
Edges: connect an edge between 1 and 2 ifreversal distance d(1, 2)
= 1. Reversal Distance - Sorting by Reversals
Reversal graph for n = 3 Reversal distance d(i, k) = length of
shortest path between vi and vk. Reversal Distance - Sorting by
Reversals
Reversal graph for n = 3 The graph is huge |V| = n!.2n A feasible
graph-search algorithm is not possible! Reversal Distance - Sorting
by Reversals
The classical approach for solving these two problems in polynomial
time was developed by Hannenhalli and Pevzner. (1995) The reversal
distance can be computed in O(n) time by Bader et. al. (2000) The
fastest algorithm to find an optimal sorting sequence is < O(n2)
by Tannier et. al. (2007) Most approaches are based on a special
structure called the breakpoint graph. Reversal Distance - Sorting
by Reversals
Breakpoint Graph: edges are black or gray. Given = (n-1n) If is
linear: we add the values 0, and n+1, the represents the
extremities of the chromosome obtaining: = (0, n-1n, n+1) If is
circular: assume n = n and add only the value 0, obtaining: = (0,
n-1n-1, n) Reversal Distance - Sorting by Reversals
Black edge: Links each pair of consecutive value in by a horizontal
(a point in ). Gray edges: Link the extremities of black edges such
that the values will be in order. Graph: collection of cycles,
where black and gray edges alternate. Trivial cycle: one black and
one gray edge (adjacency) Long Cycle: four or more edges ( 2
breakpoints) Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
= (-3 , 2 , 1 , -4) Linear Circular Linear and circular
permutations are different in breakpoint graph construction. Same
analyses. Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
+5 When sorted Reversal Distance - Sorting by Reversals
= (-3 , 2 , 1 , -4) sorted If is sorted: Only adjacencies, no
breakpoints. Breakpoint graph is a collection of trivial cycles. #
cycles in sorted graph cyc() = pts() Reversal Distance - Sorting by
Reversals
= (-3 , 2 , 1 , -4) sorted If is unsorted: At least one breakpoint,
at least one long cycle. # cycles cyc() is at most = pts() - 1
Observation: To sort a permutation , we would like to increase the
number of cycles in its breakpoint graph. Reversal Distance -
Sorting by Reversals
The effects of a reversal over a breakpoint graph . Split reversal
Joint reversal cyc( ) cyc() cyc( ) cyc() Neutral reversal cyc( )
cyc() Reversal Distance - Sorting by Reversals
The effects of a reversal over a breakpoint graph . Reversal
Distance - Sorting by Reversals
Observation: To sort , we must maximize the number of split
reversals in the sorting sequence s. If s has only split reversals:
what will be the reversal distance d()?(Hint: in terms of pts() and
cyc()) d()pts() - cyc() Are we done? Reversal Distance - Sorting by
Reversals
A split reversal does not always exist. For example, if all black
edges in the graph have the same direction. In this case, we need
to add some joint and/or neutral reversals in the sorting sequence
s. d()pts() - cyc() Reversal Distance - Sorting by Reversals
It is always possible to calculate the number of non-split
reversals in a sorting sequence. It will be the number of non-split
reversals to sort some hard components in the graph with no
orientation, unoriented components. Unoriented components can be a
hurdle hrd()or more hardly a fortress frt() in the breakpoint
graph. Hardles are very rare, and fortresses are even more rare in
permutations that represent real genomes. In practice, split
reversals are sufficient to sort the permutation. Reversal Distance
- Sorting by Reversals
Can we choose any split reversal? only safe reversals. Safe
reversal: a split reversal not producing hurdles. Unsafe reversal
Safe reversal There is always a safe reversal for any oriented .
Reversal Distance - Sorting by Reversals
The final formula for the reversal distance d() is: d()pts() -
cyc() + hrd() + frt() Where: frt() = 1, if is a fortress, and 0
otherwise. pts() = n+1, if is linear, and n if is circular.
Reversal Distance - Sorting by Reversals
Algorithm: Get optimal sorting sequence s that sorts Input:A signed
permutation . Output: An optimal sequence of reversals sorting .
Construct the breakpoint graph of . S [empty] If frt() = 1then
choose a reversal to eliminate the fortress s s . [concatenate the
reversal to s] End if Whilethere is hurdles in do choose a reversal
to eliminate the hurdle End while While is not sorted do choose a
safe split reversal to return s Reversal Distance - Sorting by
Reversals
Algorithm: Get optimal sorting sequence s that sorts Input:A signed
permutation . Output: An optimal sequence of reversals sorting .
Construct the breakpoint graph of . S [empty] If frt() = 1then
choose a reversal to eliminate the fortress s s . [concatenate the
reversal to s] End if Whilethere is hurdles in do choose a reversal
to eliminate the hurdle End while While is not sorted do choose a
safe split reversal to return s Reversal Distance - Sorting by
Reversals
Algorithm: Get optimal sorting sequence s that sorts Input:A signed
permutation . Output: An optimal sequence of reversals sorting .
Construct the breakpoint graph of . S [empty] If frt() = 1then
choose a reversal to eliminate the fortress s s . [concatenate the
reversal to s] End if Whilethere is hurdles in do choose a reversal
to eliminate the hurdle End while While is not sorted do choose a
safe split reversal to return s Reversal Distance - Sorting by
Reversals
Algorithm: Get optimal sorting sequence s that sorts Input:A signed
permutation . Output: An optimal sequence of reversals sorting .
Construct the breakpoint graph of . S [empty] If frt() = 1then
choose a reversal to eliminate the fortress s s . [concatenate the
reversal to s] End if Whilethere is hurdles in do choose a reversal
to eliminate the hurdle End while While is not sorted do choose a
safe split reversal to return s ComplexityO(n5) Tools: GRIMM &
GRAPPA Reversal Distance - Sorting by Reversals
We can have more than one optimal solution conclusions Represented
linear and circular genomes as permutations in our simple model.
Described first measures for similarity between permutation were
breakpoint and common intervals -->has no biological
interpretation. Used genome rearrangement events to describe
similarity/distances between genomes --> has more biological
meaning. Described in details one distance measure (reversal
distance) and events (reversals) to sort genomes. Thank you
Questions? Next Lecture?