Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through...
-
Upload
melanie-logan -
Category
Documents
-
view
217 -
download
0
Transcript of Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through...
Genome Rearrangements
Unoriented Blocks
Quick Review
Looking at evolutionary change through reversals
Find the shortest possible series of reversals that transform gene A into gene B
It has been shown that this results in an NP-Hard problem
Oriented Blocks
1 2 3 4 5
5 2 1 3 4
1 2 3 4 5
1 2 5 4 3
1 2 5 3 4
5 2 1 3 4
Unoriented Blocks
Orientation of the blocks in the genomes is unknown
2 1 3 7 5 4 8 6
1 2 3 4 5 6 7 8
Definitions
unoriented permutation - a mapping from {1,2,…,n} to a set L of n labels.
reversal – reverses the order of a segment of consecutive labels.
Definitions (cont.)
reversal distance – if p1,p2,…pt is a shortest series
of reversals such that
αp1p2…pt = β ,
t is the reversal distance of α with respect to β, denoted by dβ(α)
Example 1
2 1 3 7 5 4 8 6
1 2 3 4 5 6 7 8
• Assign labels 1 through 8 to the blocks in the lower chromosome
• Transfer the labels to the upper chromosome giving equal labels to homologous blocks
• We obtain a starting permutation in the upper chromosome and our goal is to sort it into the lower one, the identity
Figure below shows two chromosomes with homologous blocks
Example 1 (cont.)
2 1 3 7 5 4 8 6
1 2 3 7 5 4 8 6
1 2 3 4 5 7 8 6
1 2 3 4 5 7 6 8
1 2 3 4 5 6 7 8
Best Solution?
How do we know that this is the shortest series of reversals?
To decide what the reversal distance should be, we look at the breakpoints
Breakpoints
A breakpoint of an unoriented permutation α is a pair of labels adjacent in α but not in the target.
In the case of the identity, this means adjacent labels that are not consecutive.
Example 2
Assume the identity is the target…
Breakpoints with oriented blocks:
L 5 2 1 3 4 R
Breakpoints with unoriented blocks:
L 5 2 1 3 4 R
Example 2 (cont.)
L 2 1 3 7 5 4 8 6R
• b(α) denotes the number of breakpoints of α
• a reversal can remove at most two breakpoints hence:
d(α) > ( b(α) / 2 )
where d(α) is the reversal distance
• using this rule, we see that d(α) > 4 for the above example
Strips
L 4 5 3 2 1 R
If we have two adjacent labels that do not make a breakpoint, they must be of the form:
…x(x+1)or
…x(x-1)
Strips (cont.)
strip – a sequence of consecutive labels surrounded by breakpoints but with no internal breakpoints
Two types of strips: increasingdecreasing
Special Rules A single label surrounded by breakpoints is said to be a
strip that is both increasing and decreasing
L and R are always considered part of an increasing strip, even if they are by themselves
L and R are considered a single element for the purpose of defining strips. If 0, 1, … is a strip and …, n, n+1 is a strip, we consider these two sequences as a single strip. They are linked by the common element L = R.
Example 3
L 1 2 8 7 3 5 6 4R
Stripsincreasing: (R,L,1,2) (5,6)
decreasing: (8,7) both: (3) (4)
Theorem 1
If label k belongs to a decreasing strip and k - 1 belongs to an increasing strip, then there is a reversal that removes at least one breakpoint
L 4 5 2 3 1 7 6R kk-1
Proof Labels k – 1 and k must belong to
different strips, since only single elements are said to be both increasing and decreasing.
The above statement implies that each one is the last element in its strip (each is followed by a breakpoint).
Proof (cont.)
Two possible schemes:
… (k - 1) … k …
… k … (k - 1) …
Performing a reversal on the area between the breakpoints brings k and k-1 together, reducing the number of breakpoints by at least one.
Example 4
L 4 5 2 3 1 7 6 R
L 4 5 2 3 1 7 6 R
L 4 5 6 7 1 3 2 R
L 4 5 6 7 1 3 2 R
kk-1
Observations All permutations have at least one
increasing strip (L or R)
All permutations do not necessarily have a decreasing strip
If there is a decreasing strip, the previous proof shows that there is a breakpoint-removing reversal
Theorem 2
If label k belongs to a decreasing strip and k + 1 belongs to an increasing strip, then there is a reversal that removes at least one breakpoint.
L 5 4 2 3 1 6 7R k k+1
Proof
Two possible schemes:
(k + 1) … k …
k … (k + 1) …
Performing a reversal on the area between the breakpoints brings k and k+1 together, reducing the number of breakpoints by at least one.
Example 5
L 5 4 2 3 1 6 7 R
L 5 4 2 3 1 6 7 R
L 1 3 2 4 5 6 7 R
L 1 3 2 4 5 6 7 R
k+1k
The Result The two proofs just explained show that,
as long as we have decreasing strips, we can always reduce the number of breakpoints.
Notice that this also applies to single-element strips
What about when there are no decreasing strips?
Theorem 3
Let α be a permutation with a decreasing strip. If all reversals that remove breakpoints from α leave no decreasing strips, then there is a reversal that removes two breakpoints from α.
Proof
Let k be the smallest label involved in a decreasing strip.
p is the reversal uniting k and k - 1 k – 1 must be to the left of k,
otherwise p leaves a decreasing strip.
… (k – 1) … k …
Proof (cont.)
Let ℓ be the largest label involved in a decreasing strip.
σ is the reversal uniting ℓ and ℓ + 1 ℓ + 1 must be to the right of ℓ,
otherwise σ leaves a decreasing strip
… ℓ … (ℓ + 1) …
Proof (cont.)
Observe that k must be inside the interval reversed by σ, otherwise σ would leave k ’s decreasing strip intact.
Likewise, ℓ must belong to the interval of p
… (k – 1) ℓ … k (ℓ + 1) …
Proof (cont.)
… (k – 1) ℓ … k (ℓ + 1) …
We can see that p = σ must be true The reversal removes two
breakpoints because k is united with k – 1 and ℓ is united with ℓ + 1
Example 6
L 7 8 3 5 4 6 1 2 R
Reversals that remove breakpointsL 7 8 3 5 4 6 1 2 R
L 7 8 3 4 5 6 1 2 R
k-1 ℓ ℓ + 1k
Sorting a Permutation
We can use an algorithm that sorts a permutation using at most 2 * d(α) reversals (that is, twice as many reversals as the minimum possible)
Algorithm assumes that the target is the identity (1,2,3,4….)
General Idea
A main loop looks at the current permutation and selects the best possible reversal to apply
Update the current permutation and report the reversal applied
The loop stops when the current permutation is the identity
Choosing the Reversal s If there is a decreasing strip, look for a
reversal that reduces the number of breakpoints and leaves a decreasing strip.
If no such reversal exists, there is a reversal that encompasses all the decreasing strips and removes two breakpoints.
If there are no decreasing strips, select a reversal that cuts two breakpoints.
Sorting Algorithm
Algorithm: Sorting Unoriented Permutation input: permutation α output: series of reversals that sort α list empty while α != I do if α has a decreasing strip then k smallest label in a decreasing strip p reversal that cuts after k and after k-1 if αp has no decreasing strip then ℓ largest label in a decreasing strip p reversal that cuts before ℓ and before ℓ+1 else p reversal that cuts the first two breakpoints α αp list list+preturn list
L 1 2 . 8 7 . 3 . 5 6 . 4 . R
list emptyk 3p (8 7 3)αp = L 1 2 3 . 7 8 . 5 6 . 4 . Rα αplist (8 7 3)
k 4p (7 8 5 6 4)αp = L 1 2 3 4 . 6 5 . 8 7 . Rα αplist (8 7 3), (7 8 5 6 4)
k 5p (6 5)αp = L 1 2 3 4 5 6 . 8 7 . Rα αplist (8 7 3), (7 8 5 6 4), (6 5)
k 7p (8 7)αp = L 1 2 3 4 5 6 7 8 Rα αplist (8 7 3), (7 8 5 6 4), (6 5), (8 7)
Another Example
list emptyk 1p (2 1)αp = L 1 2 3 . 7 . 5 4 . 8 . 6 . Rα αplist (2 1)
k 4p (7 5 4)αp = L 1 2 3 4 5 . 7 8 . 6 . Rα αplist (2 1), (7 5 4)
k 6p (7 8 6)αp = L 1 2 3 4 5 6 . 8 7 . Rα αplist (2 1) , (7 5 4) , (7 8 6)
k 7p (8 7)
αp = L 1 2 3 4 5 6 7 8 R
list (2 1), (7 5 4), (7 8 6), (8 7)
L . 2 1 . 3 . 7 . 5 4 . 8 . 6 . R
But is it Optimal?
It has been shown:d(α) > ( b(α) / 2 )
For the previous example: b(α) = 7 d(α) >= 4
Although the algorithm produces the optimal result in this instance, it is not guaranteed to do so. The algorithm may produce a list containing more reversals than are actually necessary to solve the problem.
Theorem 4
The number of iterations in algorithm Sorting Unoriented Permutation is less than or equal to the number of breakpoints in the initial permutation
Proof
Must prove that, on average, each iteration removes at least one breakpoint.
We can see this is true because the only time we remove 0 breakpoints, is immediately after we have removed 2, keeping the average of 1 breakpoint per iteration intact.