A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir...
-
Upload
violet-holt -
Category
Documents
-
view
217 -
download
0
Transcript of A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir...
![Page 1: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/1.jpg)
A Linear-Time A Linear-Time Algorithm for the Algorithm for the Perfect Phylogeny Perfect Phylogeny Haplotyping (PPH) Haplotyping (PPH)
ProblemProblemZhihong Ding, Vladimir Filkov, Dan GusfieldZhihong Ding, Vladimir Filkov, Dan Gusfield
RECOMB 2005, pp. 585–600RECOMB 2005, pp. 585–600
Date: Nov. 23, 2005Date: Nov. 23, 2005
Introducer: Hsing-Yen AnnIntroducer: Hsing-Yen Ann
Modified from: Modified from: http://wwwcsif.cs.ucdavis.edu/~gusfield/LPPH_RECOMB05.ppthttp://wwwcsif.cs.ucdavis.edu/~gusfield/LPPH_RECOMB05.ppt
![Page 2: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/2.jpg)
2
AbstractAbstract
Since the introduction of the Perfect Phylogeny Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002, the problem Haplotyping (PPH) Problem in RECOMB 2002, the problem of finding a linear-time (deterministic, worst-case) solution of finding a linear-time (deterministic, worst-case) solution for it has remained open, despite broad interest in the PPH for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In problem and a series of papers on various aspects of it. In this paper we solve the open problem, giving a practical, this paper we solve the open problem, giving a practical, deterministic linear-time algorithm based on a simple data-deterministic linear-time algorithm based on a simple data-structure and simple operations on it. The method is structure and simple operations on it. The method is straightforward to program and has been fully implemented. straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than Simulations show that it is much faster in practice than prior methods. The value of a linear-time solution to the prior methods. The value of a linear-time solution to the PPH problem is partly conceptual and partly for use in the PPH problem is partly conceptual and partly for use in the inner-loop of algorithms for more complex problems, where inner-loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly. the PPH problem must be solved repeatedly.
![Page 3: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/3.jpg)
3
Haplotypes to GenotypesHaplotypes to Genotypes
0 1 1 1 0 0 1 1 0
1 1 0 1 0 0 1 0 0
2 1 2 1 0 0 1 2 0
Two haplotypes per individual
Genotype for the individual
Merge the haplotypes (experiential results)
Sites: 1 2 3 4 5 6 7 8 9
two 0s 0two 1s 1one 0 + one 1 2
![Page 4: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/4.jpg)
4
Genotypes to HaplotypesGenotypes to Haplotypes
0 1 1 1 0 0 1 1 0
1 1 0 1 0 0 1 0 0
2 1 2 1 0 0 1 2 0
Two haplotypes per individual
Genotype for the individual
0 (0, 0)1 (1, 1) 2 (1, 0) or (0, 1)
2k possible solutions!!
Haplotype Inference Problem:Given a set of n genotypes (on the same sites), determine the original set of n haplotype pairs that generated the n genotypes
![Page 5: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/5.jpg)
5
The Perfect Phylogeny The Perfect Phylogeny Model of Haplotype Model of Haplotype
EvolutionEvolution
00000
1
2
4
3
510100
1000001011
00010
01010
12345sitesAncestral haplotype
Extant haplotypes at the leaves
Site mutations on edges
Perfect: Never mutate twice on the same site
![Page 6: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/6.jpg)
6
The Perfect Phylogeny The Perfect Phylogeny Haplotyping (PPH) ProblemHaplotyping (PPH) Problem
Given a set of genotypes, find an explaining set Given a set of genotypes, find an explaining set of haplotypes that fits a perfect phylogenyof haplotypes that fits a perfect phylogeny
1
(a,b)
(b)
2
0011cc
2200bb
2222aa
2211
0011cc
0011cc
1100bb
0000bb
1100aa
0011aa
2211
10 01
00
Genotype matrix
Haplotype matrix
Perfect phylogeny
Site
(a,c,c)
![Page 7: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/7.jpg)
7
The PerfectionThe Perfection
A example A example that that does does notnot fit a perfect fit a perfect phylogenyphylogeny
1
(b)
(a,b)
2
0011cc
2200bb
2222aa
2211
0011cc
0011cc
1100bb
0000bb
0000aa
1111aa
2211
10 01
00
Genotype matrix
Haplotype matrix Not Perfect!!
Site
(c,c)
2
11(a)
1
11(a)
![Page 8: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/8.jpg)
8
Prior WorkPrior Work
Several existing algorithms:Several existing algorithms: A complex nearly-linear-time algorithm with A complex nearly-linear-time algorithm with
a little bug runs in O(a little bug runs in O(n m n m αα((n mn m))) time.) time. Two simpler but slower algorithms run in Two simpler but slower algorithms run in
O(O(n mn m2 2 ) time.) time.
Contribution of this paper:Contribution of this paper: A linear-time (O(A linear-time (O(n mn m)) algorithm.)) algorithm. Use a simple data-structure Shadow Tree Use a simple data-structure Shadow Tree
and some simple operations on it.and some simple operations on it.
![Page 9: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/9.jpg)
9
Shadow Tree (1/7)Shadow Tree (1/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 10: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/10.jpg)
10
Shadow Tree (2/7)Shadow Tree (2/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 11: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/11.jpg)
11
Shadow Tree (3/7)Shadow Tree (3/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 12: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/12.jpg)
12
Shadow Tree (4/7)Shadow Tree (4/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 13: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/13.jpg)
13
Shadow Tree (5/7)Shadow Tree (5/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 14: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/14.jpg)
14
Shadow Tree (6/7)Shadow Tree (6/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 15: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/15.jpg)
15
Shadow Tree (7/7)Shadow Tree (7/7)
rootroot
11 11
44
55
22
33
22
33
44
55
Tree edgeTree edgeShadow edgeShadow edgeClassClassFree linkFree linkFlippingFlippingFixed linkFixed linkClasses mergeClasses merge
![Page 16: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/16.jpg)
16
The AlgorithmThe Algorithm Process the genotype matrix one Process the genotype matrix one
row at a time, starting at the first row at a time, starting at the first row, and modify the shadow treerow, and modify the shadow tree
While processing an element in one While processing an element in one row, there are at most 4+3 cases, row, there are at most 4+3 cases, and all the cases can be done in and all the cases can be done in constant time.constant time.
Assumption: The genotype matrix Assumption: The genotype matrix only contains entries of value 0 and only contains entries of value 0 and 2.2.
![Page 17: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/17.jpg)
17
OldEntryListOldEntryList
Genotype Genotype MatrixMatrix
2 2 2 0 2 2 2 0 0 2 0 0 0 2 0 0 2 2 2 2 2 2 2 2 2 0 2 2 0 2 2 2 0 0 2 00 0 2 0
OldEntryList for OldEntryList for row row 33: : 11, , 22, , 33, , 55
OldEntryList : column indices that OldEntryList : column indices that have entries of value 2 in this row have entries of value 2 in this row and also have entries of value 2 in and also have entries of value 2 in some previous rowssome previous rows
33
![Page 18: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/18.jpg)
18
Shadow Tree After Shadow Tree After Processing the First Two Processing the First Two
RowsRows rootroot
11 11
44
55
22
33
Genotype Genotype MatrixMatrix
2 2 2 0 2 2 2 0 0 2 0 0 0 2 0 0 2 2 2 2 2 2 2 2 2 0 2 2 0 2 2 2 0 0 2 00 0 2 0
33
11
22
OldEntryList for OldEntryList for row 3 : row 3 : 11, , 22, , 33, , 55
22
33
44
55
![Page 19: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/19.jpg)
19
Algorithm – FirstPathAlgorithm – FirstPath
rootroot
11 11
44
55
22
33
22
33
44
55
OldEntryLOldEntryList:ist:CheckListCheckList: : 33
, , 22
22,, 33,, 5511,,
Edges Edges 44 and and 55 cannot be cannot be on the same on the same path to the path to the root in any root in any PPH solutionPPH solution
![Page 20: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/20.jpg)
20
Algorithm – SecondPathAlgorithm – SecondPath
rootroot
11 11
44
55
22
33
22
33
44
55
CheckLCheckList: ist:
33
OldEntryList: OldEntryList: 11, , 22, , 33, , 55 22
,,
![Page 21: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/21.jpg)
21
Shadow Tree to PPH Shadow Tree to PPH Solutions (1/2)Solutions (1/2)
rootroot
11 11
44
55
22
33
22
33
44
55
Genotype Genotype MatrixMatrix
2 2 2 0 2 2 2 0 0 2 0 0 0 2 0 0 2 2 2 2 2 2 2 2 2 02 0 22 2 2 0 0 2 00 0 2 0
One PPH One PPH SolutionSolution
Sites: 1 2 3 Sites: 1 2 3 4 54 5aa
bb
cc
dd
Final shadow treeFinal shadow tree
11
55
22
3344
![Page 22: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/22.jpg)
22
Shadow Tree to PPH Shadow Tree to PPH Solutions (2/2)Solutions (2/2)rootroot
1111
44
55
22
33
22
33
44
55Second PPH Second PPH
SolutionSolutionFinal shadow treeFinal shadow tree
55
33
11
2244a,da,d
b,cb,c
b,db,da,ca,c
![Page 23: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/23.jpg)
23
The EndThe End
![Page 24: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/24.jpg)
24
A P-Class of PPH A P-Class of PPH SolutionsSolutions
11 22
3355
44
Genotype Genotype MatrixMatrix
2 2 2 2 2 2 0 0 2 0 0 0 2 0 0 2 2 2 0 2 2 2 2 2 0 2 2 2 0 2 2 0 0 2 2 0 0 2
00
One PPH One PPH SolutionSolution
rooroott
P-Class: Maximum common P-Class: Maximum common subgraph in all PPH solutionssubgraph in all PPH solutions
Each P-Class consists of two Each P-Class consists of two subtreessubtrees
Sites: 1 2 3 Sites: 1 2 3 4 54 5
GenotypGenotypeses
aa
bb cc
dd
a,d
a,c
b,d
b,c
![Page 25: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/25.jpg)
25
P-Class Property of PPH P-Class Property of PPH SolutionsSolutions
Second PPH Second PPH SolutionsSolutions
All PPH solutions can be obtained by All PPH solutions can be obtained by choosing how to flip each P-Class.choosing how to flip each P-Class.
One PPH One PPH SolutionSolution
11 22
3355
44rooroo
tt
a,d
a,cb,c
b,d22
33
44
a,cb,d
rooroott11
a,d55
b,c
SwitchiSwitching ng pointpointss
SwitchiSwitching ng pointpointss
![Page 26: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/26.jpg)
26
The Key TheoremThe Key Theorem Every PPH solution can be obtained Every PPH solution can be obtained
by choosing a flip for each P-Class.by choosing a flip for each P-Class.
Conversely, after fixing one P-Conversely, after fixing one P-Class, every distinct choice of flips Class, every distinct choice of flips of P-Classes, leads to a distinct of P-Classes, leads to a distinct PPH solution.PPH solution.
If there are If there are kk P-Classes, there are P-Classes, there are 22k k –– 1 1 distinct PPH solutions. distinct PPH solutions.
![Page 27: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/27.jpg)
27
Shadow TreeShadow Tree Contains classesContains classes Each class in the shadow tree is a Each class in the shadow tree is a
subgraph of a P-Classsubgraph of a P-Class Merging classes results in larger Merging classes results in larger
classes, classes are never splitclasses, classes are never split Contains tree edges and shadow Contains tree edges and shadow
edgesedges
![Page 28: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/28.jpg)
28
Overview of the Algorithm Overview of the Algorithm for One Rowfor One Row
Procedure FirstPathProcedure FirstPath
Procedure SecondPathProcedure SecondPath
Procedure FixTreeProcedure FixTree
Procedure NewEntriesProcedure NewEntries
![Page 29: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date:](https://reader036.fdocuments.in/reader036/viewer/2022062716/56649e0f5503460f94af9f6d/html5/thumbnails/29.jpg)
29
Procedures FirstPath and Procedures FirstPath and SecondPathSecondPath
FirstPathFirstPath : Construct a first path : Construct a first path towards the root of the shadow tree towards the root of the shadow tree which passes through tree edges of as which passes through tree edges of as many columns in OldEntryList as many columns in OldEntryList as possiblepossible
SecondPathSecondPath : Construct a second path : Construct a second path towards the root of the shadow tree towards the root of the shadow tree which passes through tree edges of which passes through tree edges of columns in OldEntryList and not on the columns in OldEntryList and not on the first pathfirst path