Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX,...
-
Upload
clement-shaw -
Category
Documents
-
view
214 -
download
0
Transcript of Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX,...
![Page 1: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/1.jpg)
Algorithms for Generalized Comparison of Minisatellites
Behshad Behzadi & Jean-Marc Steyaert
LIX, Ecole PolytechniqueFrance
![Page 2: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/2.jpg)
Outline
• Biology• Evolutionary Model• Problem description • Previous works• Algorithms• Results
![Page 3: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/3.jpg)
Biology…
• Minisatellites consist of tandem arrays of short repeat units found in genome of most higher eukaryotes.
• High degree of polymorphism at minisatellites has applications from forensic studies to investigation of origin of modern humans.
![Page 4: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/4.jpg)
…Biology…
• These repeats are called variants.
• MVR-PCR is designed to find the variants.
• As an example, MSY1 is the minisatellite on the human Y-chromosomes. There are five
different repeats (variants) in MSY1.
![Page 5: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/5.jpg)
Different Repeat Types (Variants) of MSY1
![Page 6: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/6.jpg)
Graphical representations of Minisatellite Maps of 13 males
![Page 7: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/7.jpg)
Summary
• Biologists are able to compute the minisatellite maps : A sequence in which each of the repeats is replaced by its symbol.
• Study of evolution of minisatellites is an important problem, in human genetics studies.
![Page 8: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/8.jpg)
Computer Science Model
• Each variant is a symbol of an alphabet.
• A minisatellite is a string on this alphabet.
• We need to compare these strings.
![Page 9: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/9.jpg)
Evolutionary Operations
• Insertion • Deletion • Mutation
• Amplification (p-plication)• Contraction (p-contraction)
![Page 10: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/10.jpg)
Examples of operations(1)
• Insertion of d abbc — abbdc
• Deletion of c abbcb — abbb
• Mutation of c into d caab — daab
![Page 11: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/11.jpg)
Examples of operations(2)
• 4-plication of c abcb —> abccccb
• 2-contraction of b abbc —> abc
No subword replication or contraction
![Page 12: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/12.jpg)
Cost Functions
• I(x) : insertion of letter x• D(x) : deletion of letter x• M(x,y) : mutation of x to y• A
p(x) : p-plication of letter x
• Cp(x) : p-contraction of letter x
![Page 13: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/13.jpg)
Hypotheses
• All the costs are positive. • The cost of duplications (and contractions) is
less than all other operations. • Distance : M(x,x)=0• Triangle Inequality holds: M(x,y)+M(y,z) ≥ M(x,z)
![Page 14: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/14.jpg)
Transformation of s into t
• Applying a sequence of operations on s transforming it into t.
• An example : xyy —>> xbcbxzc xyy —> xy —> xxy —> xxz —> xbxz —>
xbbxz —> xbbbxz —> xbcbxz —> xbcbxzc
• The cost of a transformation is the sum of costs of its operations.
![Page 15: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/15.jpg)
Transformation distance between s and t
• TD = Minimum cost for a possible transformation of s into t.
• The transformation which gives this minimum is called optimal transformation.
![Page 16: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/16.jpg)
Previous Works
• Jobling & al.• Bérard & Rivals (RECOMB’02)• B.B. & J.M.S. (CPM2003, WABI2004): this
work
![Page 17: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/17.jpg)
Optimal Transformation between s and t
• For any transformation of s into t there are 2 different types for the symbols of s.
• Generative vs Vanishing letters of s : — create a substring in t (generation) or
— disappear (reduction)
![Page 18: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/18.jpg)
Basic lemmas
• The optimal generation of a non-empty string s from a symbol x can be achieved by a non-decreasing generation.
• In an optimal transformation of a string s to a string t any contraction operation can be done before any generation.
![Page 19: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/19.jpg)
The schema of the proof
• Sequence u is eliminated sometime during the process• The right-hand side transformation is equivalent and less
expensive w.r.t. evolution.
![Page 20: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/20.jpg)
Optimal Transformation
• generative and vanishing symbols can be transformed in two distinct optimized phases.
![Page 21: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/21.jpg)
The Algorithm
• Preprocessing ( Substring generation costs) by Dynamic programming
• Main part (Transformation distance) by Dynamic programming
![Page 22: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/22.jpg)
Substring generation costs
• G[i, j, x] : minimum generation cost of t[i..j] from symbol x among all generations which do not start by a mutation.
• T[i,j,x] is the minimum generation cost of substring t[i..j] from symbol x.
• mc[i,j,p,x] is the minimum generation cost for generating t[i..j] from symbol x among all possible generations starting with a p-plication.
![Page 23: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/23.jpg)
mc[i, j, p, x]
![Page 24: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/24.jpg)
Substring generation costs
![Page 25: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/25.jpg)
Substring Reduction cost
• S[i,j] is the minimum cost of reduction of the substring s[i..j] into s[i].
• S[i,j] is determined in the same way.
![Page 26: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/26.jpg)
Complexity
• The time complexity is
• The space complexity is
• The maximum possible p for a p-plication is noted by
![Page 27: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/27.jpg)
Transformation Distance
– TD[i,j] is the transformation distance between s[1..i] and t[1..j].
![Page 28: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/28.jpg)
Complexity
• The main algorithm complexity is O(n³) in time and O(n²) in space.
• The total time complexity is
• The total space complexity is
![Page 29: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/29.jpg)
Further improvements
• Improving the complexity using the Run Length Encoded string representation.• The RLE of aaaabbbbcccabbbbcc
is a4b4c3a1b4c2 also written a4b4c3ab4c2
• The lengths of the encoded strings with original lengths m and n are denoted by m' and n'.
![Page 30: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/30.jpg)
Generation of Runs
• There exists an optimal generation of a non-empty string t from a single symbol x in which for every run of size k > 1 in t the k-1 right symbols of the run are generated by duplications of the leftmost symbol of the run.
![Page 31: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/31.jpg)
New configurations in the transformation Generations could split runs into several parts... Similarly for reductions... See on examples different configurations
![Page 32: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/32.jpg)
x
x
![Page 33: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/33.jpg)
x
ji
y
x
j’
i’
y
![Page 34: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/34.jpg)
![Page 35: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/35.jpg)
PreProcessing: Generation Costs
• Compute the generation cost of all substrings of the target string t from any symbol x of the alphabet.
• Fill a table Gt [x,i,j] by recurrence.
![Page 36: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/36.jpg)
![Page 37: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/37.jpg)
![Page 38: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/38.jpg)
Core Algorithm
• The Transformation Distances TD between s[1..i] and t[1..j] are computed by recurrence according to lemmas derived to the situations
• Generalized dynamic programming is used again
• Complexity : O(n'3+m'3+mn'2+nm'2+mn)
![Page 39: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/39.jpg)
BS algorithms vs. BR algorithms
• Complexity improvement O(n4) to O(n3) and more with RLE (O(n2) experimentally)
• Generalization 1: amplifications and contractions of order > 2
• Generalization 2: symbol-dependent cost functions
• The triangle hypotheses on cost functions are not restrictive and can be released by some preprocessing.
![Page 40: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/40.jpg)
Dataset
• Provided by Prof. M. Jobling
• Minisatellite maps of 690 Y chromosomes from worldwide population.
• The length of the sequences is between 48 and 118.
• Distances were computed for 690x690 pairs
![Page 41: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/41.jpg)
Running times
Minisatellites DataRandom Sequences Algorithm
32.54 sec 0.03 sec 810.93 sec 2.14 secThis Work
1012.38 sec 2.49 sec 1014.32 sec 2.51 secBehzadi & Steyaert (2003)
1062.23 sec 16.37 sec 1058.44 sec
15.90 sec
Berard and Rivals (2002)
CorePre-Processing CorePre-Processing
![Page 42: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/42.jpg)
Conclusion
• More efficient algorithm to compute faster the distances and thus the phylogenetic trees.
• A more general framework which can be used for modelling more complicated biological evolutions.
![Page 43: Algorithms for Generalized Comparison of Minisatellites Behshad Behzadi & Jean-Marc Steyaert LIX, Ecole Polytechnique France.](https://reader036.fdocuments.in/reader036/viewer/2022062721/56649f1d5503460f94c33665/html5/thumbnails/43.jpg)
Thank you