Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera...
-
Upload
marvin-glenn -
Category
Documents
-
view
213 -
download
0
Transcript of Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera...
![Page 1: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/1.jpg)
Identification of SNP Alleles Identification of SNP Alleles in DNA Sequencesin DNA Sequences
Giuseppe LanciaUniversità di Padova e Celera Genomics
![Page 2: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/2.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature
![Page 3: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/3.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody
![Page 4: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/4.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody
![Page 5: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/5.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 6: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/6.jpg)
PolymorphismsPolymorphisms
E.g. think of eye-coloreye-color
A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 7: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/7.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
E.g. think of eye-coloreye-color
Or blood-typeblood-type for a feature not visible from outside
![Page 8: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/8.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
![Page 9: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/9.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
![Page 10: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/10.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 11: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/11.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 12: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/12.jpg)
- SNPs are predominant form of human variations
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 13: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/13.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 14: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/14.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 15: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/15.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 16: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/16.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 17: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/17.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 18: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/18.jpg)
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgt
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
![Page 19: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/19.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 20: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/20.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
GENOTYPEGENOTYPE: “union” of 2 haplotypes
OcE
EE
OaOg
OaE OaOt
EOg
OgE
![Page 21: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/21.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
OcE
EE
OaOg
OaE OaOt
EOg
OgE
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them X and O. Also, call ? the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over X,OGENOTYPEGENOTYPE: string over X,O,?
![Page 22: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/22.jpg)
xo xx
ox xo
ox oo
xx xx
xo oo
xo oo
xo xo
o?
??
xo
x? xx
?o
?o
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them X and O. Also, call ? the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over X,OGENOTYPEGENOTYPE: string over X,O,?
![Page 23: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/23.jpg)
THE HAPLOTYPING PROBLEMTHE HAPLOTYPING PROBLEM
Single IndividualSingle Individual: Given genomic data of one individual, determine 2 haplotypes (one per chromosome)
Population Population : Given genomic data of k individuals, determine (at most) 2k haplotypes (one per chromosome/indiv.)
For the individual problem, input is erroneous haplotype data, from sequencing
For the population problem, data is ambiguous genotype data, from screening
OBJ is lead by Occam’s razor: find minimum explanation of observed data under given hypothesis (a.k.a. parsimony principle)
![Page 24: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/24.jpg)
Theory and Results
- Polynomial Algorithms for gapless haplotyping (Lancia, Bafna, Istrail, Lippert, Schwartz 01 & Bafna, Lancia, Istrail, Rizzi 02)
- Polynomial Algorithms for bounded-length gapped haplotyping (BLIR 02)
Single individual
- NP-hardness for general gapped haplotyping (LBILS 01)
- APX-hardness (Gusfield 00)
- Reduction to Graph-Theoretic model and I.P. approach (Gusfield 01)
Population
- New formulations and Disease Detection (Lancia, Pesole 02)
![Page 25: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/25.jpg)
The Single-IndividualThe Single-IndividualHaplotyping problemHaplotyping problem
![Page 26: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/26.jpg)
TGAGCCTAG GATTT GCCTAG CTATCTT
ATAGATA GAGATTTCTAGAAATC ACTGA
TAGAGATTTC TCCTAAAGAT CGCATAGATA
fragmentation
sequencing
assembly
Shotgun Assembly of a Chromosome [ Webber and Myers, 1997]
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
![Page 27: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/27.jpg)
Sequencing errors:
ACTGCCTGGCCAATGGAACGGACAAG CTGGCCAAT CATTGGAAC AATGGAACGGA
Paralogous regions:
ACAAACCCTTTGGGACT … CTAGTAAACCCTATGGGGA AAACCCTT TAAACCCT CTATGGGA CCTATGG CTTTGGGACT ACCCTATGGG
ERROR SOURCESERROR SOURCES
![Page 28: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/28.jpg)
Given errorserrors (sequencing errors, and/or paralogous) the data may be inconsistentinconsistent with exactly 2 haplotypes
PROBLEMPROBLEM: Find and remove : Find and remove the errors so that the data the errors so that the data becomes consistent with becomes consistent with exactly 2 haplotypesexactly 2 haplotypes
Hence, assembler is unable Hence, assembler is unable to build 2 chromosomesto build 2 chromosomes
![Page 29: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/29.jpg)
ACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATGACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATG X X O O O X X X X X O
The data: a SNP matrix
![Page 30: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/30.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
![Page 31: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/31.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
![Page 32: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/32.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
1
6
2
3
4
5
Fragment Conflict Graph GF(M)
We have 2 haplotypes iff GF is BIPARTITE
![Page 33: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/33.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
1
6
2
3
4
5
PROBLEM (Fragment Removal): make GF Bipartite
![Page 34: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/34.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
PROBLEM (Fragment Removal): make GF Bipartite
1
6
2
3
4
5
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X4 O O X - - - - O -
3 X X O X X - - - -5 - - - - - - - X O
O O X O X X O O X
X X O X X - - X O
![Page 35: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/35.jpg)
Removing fewest fragments is equivalent to maximum induced bipartite subgraph
NP-complete [Yannakakis, 1978a, 1978b; Lewis, 1978] O(|V|(log log |V|/log |V|)2)-approximable [Halldórsson, 1999] not O(|V|)-approximable for some [Lund and Yannakakis, 1993]
Are there cases of M for which GF(M) is easier?
YES: the gapless M
---OXXOO---OXOOX--- gap
---OXXOOXOXOXOOX--- gapless
---OXX--XO----OX--- 2 gaps
![Page 36: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/36.jpg)
Why gaps?
Sequencing errors (don’t call with low confidence)
---OOXX?XX--- ===> ---OOXX-XX---
Celera’s mate pairs
attcgttgtagtggtagcctaaatgtcggtagaccttga
attcgttgtagtggtagcctaaatgtcggtagaccttga
![Page 37: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/37.jpg)
THEOREM
For a gapless M, the Min Fragment RemovalProblem is Polynomial
NOTENOTE: Does not need to be gapless. Enough if it can be sorted to become such (Consecutive Ones Property, Booth and Lueker, 1976)
![Page 38: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/38.jpg)
An O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
![Page 39: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/39.jpg)
An O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
LFT(i) RGT(i)
sort according to LFT
![Page 40: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/40.jpg)
An O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
LFT(i) RGT(i)
D(i;h,k) := min cost to solve up to row i, with k, h not removed and put in different haplotypes, and maximizing RGT(k), RGT(h)
sort according to LFT
D(i; h,k) =
D(i-1; h,k) if i, k compatible and RGT(i) <= RGT(k) or i, h compatible and RGT(i) <= RGT(h)
1 + D(i-1; h, k) otherwise{
OPT is min h,k D( n; h, k ) and can be found in time O(nm + n^3)
![Page 41: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/41.jpg)
Th: NP-Hard if 2 gaps per fragment
proof: (simple) use fact that for every G there is M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraph on 3-regular graphs
Th : NP-Hard if even 1 gap per fragment proof: technical. reduction from MAX2SAT
WITH GAPS…..WITH GAPS…..
But, gaps must be long for problem to be difficult.
We have O( 2 mn + 2 n ) D.P.
for MFR on matrix with total gaps length L
2L 3L 3
![Page 42: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/42.jpg)
The fragment removal is good to get rid of contaminants.
However, we may want to keep all fragments andcorrect errors otherwise
A dual point of view is to disregard some SNPs and keepthe largest subset sufficient to reconstruct the haplotypes
All fragments get assigned to one of the two haplotypes.We describe the min SNP removal problem: remove the fewest number of columns from M so that the fragmentgraph becomes bipartite.
![Page 43: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/43.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 44: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/44.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 45: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/45.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 46: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/46.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 47: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/47.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 48: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/48.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 49: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/49.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
SNP conflict graph GS(M)1 node for each SNP (column)edge between conflicting SNPs
![Page 50: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/50.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 51: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/51.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 52: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/52.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 53: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/53.jpg)
THEOREM 1
For a gapless M, GF(M) is bipartiteif and only if GS(M) is an independent set
THEOREM 2
For a gapless M, GS(M) is a perfect graph
COROLLARY
For a gapless M, the min SNP removalproblem is polynomial
![Page 54: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/54.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--OOXXOO-------------OOXOOXOXXO-----------XXOXOXXX-----XXOOXOXXO-----------XOOOX-----------XXXXXO-------XXOXXOXOO------
Assume M gapless, GS(M) an independent set, but GF(M)not bipartite.
Take an odd cycle in GF
![Page 55: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/55.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
There is a generic structure of hor-vert cycle
![Page 56: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/56.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
There cannot be only one vertical line in odd cycle
We merge rightmost and next to reduce them by 1
Hence, there cannot be a minimal (in n. of vertical lines) counterexample
![Page 57: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/57.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
![Page 58: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/58.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X??O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
Merge the rightmost lines
![Page 59: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/59.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X--------------??O----------??????X-------------???O------------????X--------X???????O------
“vertical lines”
Still a counterexample!
Merge the rightmost lines
![Page 60: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/60.jpg)
1 2 31 O - O 2 - O X 3 X X -
Note: Theorem not true if there are gaps
1
2 3
1
2 3
GF(M) GS(M)
M
![Page 61: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/61.jpg)
THEOREM 2For a gapless M, GS(M) is a perfect graph
PROOF: GS(M) is the complement of a comparability graph A
Comparability graphs are perfect
Comparability Graphs: unoriented that can be oriented to become a partial order
![Page 62: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/62.jpg)
LEMMA: If i<j<k and (i,k) is a SNP conflict then either (i,k) or (j,k) is also a SNP conflict
i j k - X O O ? X O X - - O X O ? X X X -
Equal:conflicts with i
OO
Different:conflicts with k
OX
i kj
I.e. if (i,j) is not a conflict and (j,k) is not a conflict, also (i,k) is not a conflict
So (u,v) with u < v and u not a conflict with v is a comparability graph Aand GS is A complement
NOTE: ind set on perfect graph is in P (Lovasz, Schrijvers, Groetschel, 84)
![Page 63: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/63.jpg)
THEOREM: The min SNP removal is NP-hard if there can be gaps (Reduction from MAXCUT)
Again, gaps must be long for problem to be difficult.
We have O(mn + n ) D.P.
for MSR on matrix with total gaps length L
2L + 1 2L + 2
Hence gapless MSR is polynomial (max stable set on perfect graph).
We have better, D.P., algorithms, O(mn + m^2)
What if gaps ?
![Page 64: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/64.jpg)
The PopulationThe PopulationHaplotyping problemHaplotyping problem
![Page 65: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/65.jpg)
The input is GENOTYPE data
oooxx
xxoxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
![Page 66: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/66.jpg)
The input is GENOTYPE data
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxxxoxx
oooxxoooxx
xxoxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
Each genotype is explained by two haplotypes
We will define some objectives for H
![Page 67: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/67.jpg)
1st Objective1st Objective (open research problem):
minimize |H|
2nd Objective2nd Objective based on inference rule:
![Page 68: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/68.jpg)
xoxxooxoxx +********** =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
Inference RuleInference Rule
![Page 69: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/69.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
Inference RuleInference Rule
![Page 70: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/70.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
We write h + h’ = g
g and h must be compatible to derive h’
Inference RuleInference Rule
![Page 71: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/71.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
![Page 72: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/72.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
![Page 73: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/73.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 74: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/74.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo
![Page 75: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/75.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo xxxx SUCCESS
![Page 76: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/76.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 77: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/77.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo
![Page 78: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/78.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo FAILURE (can’t resolve xx?? )
OBJ: find order of application rule that leaves the fewest elements in GOBJ: find order of application rule that leaves the fewest elements in G
![Page 79: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/79.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
![Page 80: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/80.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
1. expand genotypes
![Page 81: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/81.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
1. expand genotypes
![Page 82: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/82.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
2. create (h, h’) if exists g s.t. h’ can bederived from g and h
1. expand genotypes 3. Largest number of nodes in forest
rooted at unambiguos genotpes = = largest number of ambiguous genotypes resolved
Hence, find largest number of nodes in forest rooted at unambiguos genotpes. Use I.P. model with vars x(ij).
This reduction is exponential. Is there a better practical approach?
![Page 83: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/83.jpg)
3rd Objective3rd Objective (open research problem)Disease Detection:
oooxx
??oxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
![Page 84: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/84.jpg)
3rd Objective3rd Objective (open research problem)Disease Detection:
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxoooxx
oooxxoooxx
??oxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
H contains H’, s.t. each diseased has one haplotype in H’ and each healty none
minimize | H’ |
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
![Page 85: Identification of SNP Alleles in DNA Sequences Giuseppe Lancia Università di Padova e Celera Genomics.](https://reader034.fdocuments.in/reader034/viewer/2022051821/5697c0041a28abf838cc4589/html5/thumbnails/85.jpg)
THE ENDTHE END © MMII G.L.