Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs

34
Computational Problems Computational Problems in Perfect Phylogeny in Perfect Phylogeny Haplotyping: Haplotyping: Xor-Genotypes and Tag Xor-Genotypes and Tag SNPs SNPs Tamar Barzuza Tamar Barzuza 1 Jacques S. Jacques S. Beckmann Beckmann 2,3 2,3 Ron Shamir Ron Shamir 4 Itsik Pe’er Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Computer Science and Applied Mathematics, Weizmann Institute of Science Institute of Science 2 Molecular Genetics, Weizmann Institute of Science Molecular Genetics, Weizmann Institute of Science 3 Génétique Médicale, Universitätsspital Lausanne Génétique Médicale, Universitätsspital Lausanne

description

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs. Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Institute of Science 2 Molecular Genetics, Weizmann Institute of Science - PowerPoint PPT Presentation

Transcript of Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs

Page 1: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Computational Problems in Computational Problems in Perfect Phylogeny Perfect Phylogeny Haplotyping: Haplotyping:

Xor-Genotypes and Tag SNPsXor-Genotypes and Tag SNPs Tamar BarzuzaTamar Barzuza11 Jacques S. Jacques S.

BeckmannBeckmann2,32,3

Ron ShamirRon Shamir44 Itsik Pe’erItsik Pe’er55

11Computer Science and Applied Mathematics, Weizmann Institute of Computer Science and Applied Mathematics, Weizmann Institute of ScienceScience

22Molecular Genetics, Weizmann Institute of ScienceMolecular Genetics, Weizmann Institute of Science33Génétique Médicale, Universitätsspital LausanneGénétique Médicale, Universitätsspital Lausanne 44School of Computer Science, Tel- Aviv UniversitySchool of Computer Science, Tel- Aviv University

55Medical and Population Genetics Group, Broad InstituteMedical and Population Genetics Group, Broad Institute

Page 2: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

OverviewOverview IntroductionIntroduction Xor PPHXor PPH

Theoretical outlines and resultsTheoretical outlines and results Experimental resultsExperimental results

Informative SNPsInformative SNPs Theoretical resultsTheoretical results

Summary and Future researchSummary and Future research

Page 3: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

ChromosomesChromosomes

Page 4: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA

AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA

AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCCAGGATTAGCTGCCACA

AATATATCGCTATCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA

AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA

AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA

ATTAAA

GTTTGG

AACCCC

CCCTTT

SNP – Single nucleotide SNP – Single nucleotide polymorphismpolymorphism

Page 5: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

ATTAAA

GTTTGG

AACCCC

CCCTTT

SNP – Single nucleotide SNP – Single nucleotide polymorphismpolymorphism

Page 6: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Haplotypes, Genotypes and XOR-Haplotypes, Genotypes and XOR-GenotypesGenotypes

Genotype: A/T T/G A C

Haplotypes:A G A C T T A C

XOR-Genotype: Het Het Hom Hom

1 2 3 4ATTAAA

GTTTGG

AACCCC

CCCTTT

100111

100011

001111

111000

Page 7: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Haplotypes, Genotypes and XOR-Haplotypes, Genotypes and XOR-GenotypesGenotypes

1 2 3 4ATTAAA

GTTTGG

AACCCC

CCCTTT

100111

100011

001111

111000

Genotype: 2 2 0 1

Haplotypes:1 1 0 1 0 0 0 1

XOR-Genotype: {1, 2} {1, 2}

Page 8: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Perfect PhylogenyPerfect Phylogeny1 0 0 0 01 0 0 1 01 0 1 0 01 1 0 0 01 0 0 1 10 0 0 1 0

SNPs only

1 0 0 1 0

1 0 0 0 0 0 0 0 1 0 1 0 0 1 1

1 1 0 0 0 1 0 1 0 0

4: 1→01: 1→0 5: 0→1

2: 0→1 3: 0→11 0 1 0 01 1 0 0 0

2 3

Page 9: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Previous workPrevious workHaplotyping:Haplotyping: haplotypes from haplotypes from genotypesgenotypes::Input:Input: Genotypes Genotypes GG={={GG11,…,,…,GGnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}Output:Output: Find the haplotypes Find the haplotypes HH={={HH11,…,,…,HH22nn}} that gave rise to that gave rise to GG

General heuristics: General heuristics: Clark ’90 Clark ’90 Excoffier+Slatkin ‘95Excoffier+Slatkin ‘95

PPH:PPH: Perfect phylogeny haplotyping ( Perfect phylogeny haplotyping (nn genotypes, genotypes, mm SNPs):SNPs):Gusfield 2002Gusfield 2002 O(O(nmnm((nn,,mm)) )) Bafna et. al 2002Bafna et. al 2002O(O(nmnm22))Eskin et. al 2003Eskin et. al 2003O(O(nmnm22))

Graph Realization

Graph Realization

Page 10: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Previous workPrevious work

Tutte 1959 Tutte 1959 O(O(nn22mm), ), Gavril and Tamari 1983 Gavril and Tamari 1983 O(O(nmnm22), ),

Bixby and Wagner 1988 Bixby and Wagner 1988 O(O(nmnm((nn,,mm))))

The graph realization problem:The graph realization problem: Input: Input: A hypergraphA hypergraph HH=({1,…,=({1,…,mm}, }, PP))

PP={={PP11,,PP22,…,,…,PPnn}, }, PPii{1,…,{1,…,mm}}

Goal: Goal: A treeA tree TT=(=(VV,,EE) ) with with EE==NN s.ts.t PPii labels a path inlabels a path in TT

Input:Input: { {1,2}, {2,3} }{ {1,2}, {2,3} }Output:Output:

11 22 3311

22 33

Page 11: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

OverviewOverview IntroductionIntroduction Xor PPHXor PPH

Theoretical outlines and resultsTheoretical outlines and results Experimental resultsExperimental results

Informative SNPsInformative SNPs Theoretical resultsTheoretical results

Summary and Future researchSummary and Future research

Page 12: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Xor-haplotypingXor-haplotyping: haplotypes from : haplotypes from xor-genotypesxor-genotypes::Input:Input: 1. Xor-genotype data 1. Xor-genotype data (can be obtained by DHPLC)(can be obtained by DHPLC)

2. Three genotypes2. Three genotypesGoal:Goal: Resolve the haplotypes and their perfect phylogeny Resolve the haplotypes and their perfect phylogeny

XPPH - Xor perfect phylogeny XPPH - Xor perfect phylogeny haplotypinghaplotyping

haplo

type

s Xor-genotypes genotypes{1, 2}{1, 2} 0/1 0/1 0 1

{2, 4}{2, 4} 0 0/1 0 0/1

{2, 3, 4}{2, 3, 4} 0 0/1 0/1 0/1

{1, 2, 4}{1, 2, 4} 0/1 0/1 0 0/1

{1}{1} 0/1 1 0 01 1 0 10 1 0 1

1 1 0 10 0 0 1

0 1 0 10 0 0 0

0 1 1 10 0 0 01 1 0 10 0 0 0

?

????

Page 13: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Xor-haplotypingXor-haplotyping: haplotypes from : haplotypes from xor-genotypesxor-genotypes::Input:Input: 1. Xor-genotype data 1. Xor-genotype data (can be obtained by DHPLC)(can be obtained by DHPLC)

2. Three genotypes2. Three genotypesGoal:Goal: Resolve the haplotypes and their perfect phylogeny Resolve the haplotypes and their perfect phylogeny

XPPH - Xor perfect phylogeny XPPH - Xor perfect phylogeny haplotypinghaplotyping

haplo

type

s Xor-genotypes genotypes{1, 2}{1, 2} 0/1 0/1 0 1

{2, 4}{2, 4} 0 0/1 0 0/1

{2, 3, 4}{2, 3, 4} 0 0/1 0/1 0/1

{1, 2, 4}{1, 2, 4} 0/1 0 0/1 0/1

{1}{1} 0/1 1 0 0

?????

Page 14: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Strategy:Strategy: 1.1. Input: Input: Xor-genotype data Xor-genotype dataGoal:Goal: Find the perfect phylogeny Find the perfect phylogeny

2. Additional 2. Additional Input:Input: 3 genotypes 3 genotypesGoal:Goal: Find haplotypes Find haplotypes

Step 1:Step 1:Xor-genotypeXor-genotype = {Het SNPs} = A = {Het SNPs} = A pathpath in the in the perfect perfect

phylogenyphylogeny Build a tree from its paths Build a tree from its paths Graph realization Graph realization

Input reduction:Input reduction: Merge SNPs that are equivalent in the xor- Merge SNPs that are equivalent in the xor-datadata

Proof:Proof: Unique graph realization solution Unique graph realization solution A perfect phylogeny A perfect phylogeny

XPPH - Xor perfect phylogeny XPPH - Xor perfect phylogeny haplotypinghaplotyping

Page 15: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

GREALGREAL Find graph realization or determine that none Find graph realization or determine that none

existsexists Count num of graph realization solutions for dataCount num of graph realization solutions for data Stable and fastStable and fast Available at Available at http://http://www.cs.tau.ac.il/~rshamir/grealwww.cs.tau.ac.il/~rshamir/greal//

SimulationsSimulations Simulate data of Simulate data of nn individuals using Hudson 2002 individuals using Hudson 2002 Remove all SNPs with <5% minor allele frequencyRemove all SNPs with <5% minor allele frequency Apply GREAL: Is there a single solution?Apply GREAL: Is there a single solution? Repeat 5000 times for each Repeat 5000 times for each nn

We implemented Gavril & Tamari’s algorithm (83) We implemented Gavril & Tamari’s algorithm (83) for graph realization: for graph realization: O(O(mm22nn))

Page 16: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

ResultsResultsThe percentage of single solutions vs sample size

Page 17: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

The percentage of single solutions vs sample size

R.H. Chung and D. Gusfield 2003

ResultsResults

Page 18: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Perfect phylogenyPerfect phylogeny? HaplotypesHaplotypesStep 2Step 2

1

230 0 0

1 1 01 0 1

1

231 0 0

0 1 0 0 0 1

{1, 2}{1, 3}{2, 3}

Xor-genotypes

?

XPPHXPPH

Resolution up to Resolution up to bit flippingbit flipping : gives the haplotypes : gives the haplotypes structurestructure

Page 19: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

1

23

{1, 2}{1, 3}{2, 3}

Xor-genotypes

1 2 2Genotype

1 x x1 x x

0 x x

SNP #1 homozygous SNP #1 homozygous Can infer SNP #1 for all Can infer SNP #1 for all haplotypeshaplotypes Need individuals with Need individuals with xor-genotypes (=xor-genotypes (={het {het SNPs}) = SNPs}) =

XPPHXPPH

Perfect phylogenyPerfect phylogeny? HaplotypesHaplotypesStep 2Step 2

Page 20: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Theorem:Theorem: xor-genotypes=xor-genotypes= there are there are three three xor-genotypes with empty intersectionxor-genotypes with empty intersection

Proof: Proof: ! xor-genotypes are tree paths ! xor-genotypes are tree paths (ow: NP-(ow: NP-hard)hard)

(1) The intersection of two tree paths is an (1) The intersection of two tree paths is an intervalinterval

Page 21: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn

XX11

Page 22: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn

XX11

Page 23: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn

(3) (3) XXLL ends firstends first,, XXRR begins last begins last

XXLL

XXRR

XX11

XX11

Page 24: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn

(3)(3) XXLL ends firstends first,, XXRR begins last begins last

XXLL

XXRR

XX11XXLL

XXRR

XX11

Page 25: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

(Proof) (Proof) (2) Pick (2) Pick XX11 arbitrarily, take arbitrarily, take XX11 XX22, , XX11 XX33, … , … XX11XXnn

XX11XXLLXXRR==

XXLL

XXRR

XX11 XXLL

XXRR

XX11

XXLL

XXRR

XX11

Page 26: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Find 3 individuals to genotype in Find 3 individuals to genotype in O(O(nmnm))

Resolve the haplotypesResolve the haplotypes

XXLL

XXRR

XX11 XXLL

XXRR

XX11

XXLL

XXRR

XX11

Page 27: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

OverviewOverview IntroductionIntroduction Xor PPHXor PPH

Theoretical outlines and resultsTheoretical outlines and results Experimental resultsExperimental results

Informative SNPsInformative SNPs Theoretical resultsTheoretical results

Summary and Future researchSummary and Future research

Page 28: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Input:Input: 1. Haplotypes 1. Haplotypes HH={={HH11,…,,…,HHnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}2. A set of interesting SNPs2. A set of interesting SNPs SS""SS

Output:Output: Minimal setMinimal set SSSS\\SS"" that distinguishes the same that distinguishes the same haplotypes as haplotypes as SS""

Informative SNPs (Bafna et al. 2003):Informative SNPs (Bafna et al. 2003):

Informative SNPsInformative SNPs

1 0 0 0 00 0 1 0 00 0 0 1 10 1 0 1 0Ha

plo t

ypes

4 3

2

1

SNPs1 2 3 4 5

Not perfect phylogeny: NP-hard (Not perfect phylogeny: NP-hard (MINIMUM TEST SETMINIMUM TEST SET))Perfect phylogeny, 1 interesting SNP: O(Perfect phylogeny, 1 interesting SNP: O(nmnm), Bafna et al. 2003), Bafna et al. 2003

Page 29: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Informative SNPs:Informative SNPs:Input:Input: 1. Haplotypes 1. Haplotypes HH={={HH11,…,,…,HHnn} } on SNPs on SNPs SS={={ss11,…,,…,ssmm}}

2. A set of interesting SNPs2. A set of interesting SNPs SS""SS 3. A perfect phylogeny for 3. A perfect phylogeny for HH..4. A cost function4. A cost function CC::SSRR++..

Output:Output: SSSS\\SS"" with minimal costwith minimal cost that distinguishes that distinguishes the same haplotypes as the same haplotypes as SS""

Informative SNPsInformative SNPs

Generalization of prev defGeneralization of prev def

1 0 0 0 00 0 1 0 00 0 0 1 10 1 0 1 0Ha

plo t

ypes

4 3

2

1

SNPs1 2 3 4 5

Page 30: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

We find informative SNPs setWe find informative SNPs set Of minimal costOf minimal cost For any number of interesting SNPsFor any number of interesting SNPs In O(In O(mm))

By a dynamic programming algorithm that By a dynamic programming algorithm that climbs up the perfect phylogeny treeclimbs up the perfect phylogeny tree

We prove that the definition of informative We prove that the definition of informative SNPs generalizes to a more practical SNPs generalizes to a more practical definitiondefinition Under the perfect phylogeny model, informative Under the perfect phylogeny model, informative

SNPs on genotypes and haplotypes are SNPs on genotypes and haplotypes are equivalentequivalent

Page 31: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

SummarySummary Xor-haplotyping:Xor-haplotyping:

DefinitionDefinition Resolve haplotypes given xor-data and 3 Resolve haplotypes given xor-data and 3

genotypes in O(genotypes in O(nmnm((mm,,nn)))) ImplementationImplementation Experimental resultsExperimental results

Selection of tag SNPs:Selection of tag SNPs: Generalize to Generalize to

arbitrary costarbitrary cost many interesting SNPsmany interesting SNPs

Find optimal informative SNPs set in O(Find optimal informative SNPs set in O(mm) time) time Combinatorial observation allows practical usesCombinatorial observation allows practical uses

Page 32: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Future researchFuture research Relax the strong assumption of perfect Relax the strong assumption of perfect

phylogenyphylogeny Deal with data errors and missing dataDeal with data errors and missing data

Obtain empirical results for the theoretical Obtain empirical results for the theoretical work on informative SNPswork on informative SNPs Preliminary results show that blocks of up to 600 Preliminary results show that blocks of up to 600

SNPs are distinguishable by ~20 informative SNPsSNPs are distinguishable by ~20 informative SNPs

Page 33: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs
Page 34: Computational Problems in Perfect Phylogeny Haplotyping:        Xor-Genotypes and Tag SNPs

Theorem:Theorem: All genotypes are distinct within a block All genotypes are distinct within a blockProof: Proof: Assume to the contrary equivalency of two:Assume to the contrary equivalency of two:

1111

0000

11 0011 00

00 1100 11

1111

0000

1111

0000

1111

0000

2222

1100

1100

2222

1100

1100

2222

0011

1100

HaplotypePair 1

HaplotypePair 2

Genotype 1Genotype 2