Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger...
Transcript of Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger...
![Page 2: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/2.jpg)
DNA
- double strand
- inside nucleus (safe)
RNA
- single strand
- outside nucleus
- transfers genetic code
- Thymine (T) → Uracil (U)
Polymerase
2/33
![Page 3: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/3.jpg)
DNA
- double strand
- inside nucleus (safe)
RNA
- single strand
- outside nucleus
- transfers genetic code
- Thymine (T) → Uracil (U)
Polymerase
2/33
![Page 4: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/4.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 5: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/5.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 6: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/6.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 7: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/7.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 8: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/8.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 9: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/9.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*
GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 10: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/10.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*
GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 11: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/11.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 12: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/12.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 13: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/13.jpg)
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
![Page 14: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/14.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 15: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/15.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 16: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/16.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 17: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/17.jpg)
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 18: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/18.jpg)
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 19: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/19.jpg)
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 20: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/20.jpg)
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 21: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/21.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 22: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/22.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 23: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/23.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 24: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/24.jpg)
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
![Page 25: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/25.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTCACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
5/33
![Page 26: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/26.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
5/33
![Page 27: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/27.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
![Page 28: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/28.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
![Page 29: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/29.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
![Page 30: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/30.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
![Page 31: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/31.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 32: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/32.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
1. produce best pairwise overlaps
2. layout the reads according to the overlaps
3. for each position, compute consensus base
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 33: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/33.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTC CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
1. produce best pairwise overlaps
2. layout the reads according to the overlaps
3. for each position, compute consensus base
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 34: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/34.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 35: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/35.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find path using all overlaps
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 36: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/36.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
![Page 37: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/37.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCTCCTT
CTTG
TTGG
5/33
![Page 38: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/38.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCTCCTT
CTTG
TTGG
5/33
![Page 39: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/39.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 40: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/40.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 41: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/41.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 42: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/42.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 43: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/43.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 44: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/44.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 45: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/45.jpg)
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
![Page 46: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/46.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
I removes reads in high-coverage area (likely repeats)I orientation step (heuristic) + ordering step (heuristic)I coded in Pearl (!!!)I (observed sparse contig graph)
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 47: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/47.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
I heuristic contig extensionI “reasonable time”
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 48: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/48.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
I np+O(1) time (p =#edge-deletions (≥ feedback edge set))I most work done by a heuristic “graph contraction”
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 49: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/49.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
I Mixed-Integer Quadratic ProgrammingI deals with uncertain data (slack variables)
“intractable even for small # of contigs”
I heuristic workaround:I solve relaxed formulation & use slack values ILP
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 50: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/50.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
I orientation step: use FPT algo for Odd Cycle TransersalI ordering step: heuristic
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 51: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/51.jpg)
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
![Page 52: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/52.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
7 /33
![Page 53: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/53.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 54: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/54.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGG
GTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 55: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/55.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 56: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/56.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 57: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/57.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 58: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/58.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 59: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/59.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 60: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/60.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
![Page 61: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/61.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp ∈ N
Question: Can G be covered by
≤σp alternating paths
σc alternating cycles
of total weight ≥ k?
7 /33
![Page 62: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/62.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N
Question: Can G be covered by
≤σp alternating paths &
≤σc alternating cycles
of total weight ≥ k?
7 /33
![Page 63: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/63.jpg)
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Exact ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N
Question: Can G be covered by
σp alternating paths &
σc alternating cycles
of total weight ≥ k?
7 /33
![Page 64: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/64.jpg)
19
10
71
26
22
32
59
71
78
2 60
85
1
38
1
17
19
7
25
8/33
![Page 65: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/65.jpg)
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
8/33
![Page 66: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/66.jpg)
1
63
94
25
9
28
29
55
2
17
1
67
62
1
5
71
26
24
47
83
71
34
71
68
72
1
51
19
79
54
151
67
6
66
34
68
59
2
23
11
12
1
5
1
2
6
1
49
13
71
28
71
5
6
11
33
67
24
88
82
2
42
25
1
39
14
65
5
2
70
24
89
46
1
1
1
80
1
73
64
36
72
33
73
67
59
1
14
1
1
70
3
1
33
53
2
2
1
135
4
2980
74
80
84
11
31
1
1
62
73
38
31
33
77
108
28
13
1 4
1
70
8
5
1
47
3
31
2
1
18
56
49
45
2
2
8/33
![Page 67: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/67.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D
2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions
9/33
![Page 68: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/68.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D2. duplicate all vertices M
3. “slide” down all arrow tips & ignore directions
9/33
![Page 69: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/69.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions
9/33
![Page 70: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/70.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
9/33
![Page 71: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/71.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇒”: replace each v in the Hamiltonian path by vup → vlow .
alternating X covers M X
9/33
![Page 72: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/72.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇒”: replace each v in the Hamiltonian path by vup → vlow .
alternating X covers M X
9/33
![Page 73: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/73.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇐”: contract each matching edge in the covering alternating path.
hits all vertices exactly once X is valid directed path X
9/33
![Page 74: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/74.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇐”: contract each matching edge in the covering alternating path.
hits all vertices exactly once X is valid directed path X
9/33
![Page 75: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/75.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
9/33
![Page 76: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/76.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
9/33
![Page 77: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/77.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.9/33
![Page 78: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/78.jpg)
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Exact Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Exact Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.9/33
![Page 79: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/79.jpg)
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
![Page 80: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/80.jpg)
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
![Page 81: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/81.jpg)
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
![Page 82: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/82.jpg)
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
![Page 83: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/83.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
no edges between X & Y need 2 objects (paths/cycles)
otherwise can always cover G with 1 path
TODO
decide if we can cover with 1 cycle
11 / 33
![Page 84: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/84.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 85: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/85.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 86: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/86.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 87: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/87.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 88: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/88.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 89: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/89.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 90: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/90.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 91: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/91.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 92: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/92.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 93: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/93.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 94: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/94.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 95: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/95.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 96: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/96.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
![Page 97: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/97.jpg)
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Theorem
Scaffolding can be solved in O(n + m) time on co-bipartite graphs
11 / 33
![Page 98: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/98.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `
M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 99: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/99.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `
M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 100: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/100.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 101: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/101.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 102: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/102.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 103: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/103.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 104: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/104.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 105: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/105.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 106: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/106.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 107: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/107.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 108: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/108.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 109: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/109.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g
delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 110: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/110.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce k
or g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 111: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/111.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u
u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 112: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/112.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below”
∃ “clone” of g−p−` take p−`
12 /33
![Page 113: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/113.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−`
take p−`
12 /33
![Page 114: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/114.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
![Page 115: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/115.jpg)
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Theorem
Scaffolding can be solved in O(n) time on unweighted trees
12 /33
![Page 116: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/116.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
![Page 117: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/117.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
![Page 118: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/118.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .[p,X]v := max
p1, p2, . . . , pc∑pi = p
∑1≤i≤c
maxx∈{X,X}
[pi , x ]vi
BAD IDEA!
13 /33
![Page 119: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/119.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .[p,X]v := max
p1, p2, . . . , pc∑pi = p
∑1≤i≤c
maxx∈{X,X}
[pi , x ]vi
BAD IDEA!
13 /33
![Page 120: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/120.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
![Page 121: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/121.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 122: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/122.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈M
ω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 123: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/123.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M
{[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 124: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/124.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 125: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/125.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 126: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/126.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 127: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/127.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 128: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/128.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 0
1 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 129: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/129.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 130: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/130.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 131: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/131.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 132: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/132.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 133: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/133.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 134: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/134.jpg)
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
![Page 135: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/135.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding
1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 136: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/136.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 137: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/137.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 138: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/138.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 139: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/139.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 140: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/140.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 141: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/141.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 142: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/142.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 143: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/143.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 144: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/144.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 145: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/145.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 146: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/146.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 147: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/147.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
![Page 148: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/148.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution X
Note: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
![Page 149: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/149.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
![Page 150: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/150.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
![Page 151: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/151.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
![Page 152: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/152.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
![Page 153: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/153.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
![Page 154: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/154.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete (bipartite) graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
![Page 155: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/155.jpg)
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete (bipartite) graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
![Page 156: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/156.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding
1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 157: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/157.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 158: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/158.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 159: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/159.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 160: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/160.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 161: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/161.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 162: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/162.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 163: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/163.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
![Page 164: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/164.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainProof
Result S∗ is a valid solution X
ω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2
15 /33
![Page 165: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/165.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainProof
Result S∗ is a valid solution Xω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2
15 /33
![Page 166: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/166.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
![Page 167: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/167.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete (bipartite) graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
![Page 168: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/168.jpg)
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete (bipartite) graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
![Page 169: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/169.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 170: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/170.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 171: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/171.jpg)
Exact Algorithms I: Brute Force
j ii − 2
j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 172: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/172.jpg)
Exact Algorithms I: Brute Force
j ii − 2
j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 173: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/173.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 174: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/174.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 175: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/175.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
![Page 176: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/176.jpg)
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(√
2n · n/2!)
16 /33
![Page 177: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/177.jpg)
Exact Algorithms II: Dynamic Programming
S − xy
x yw
u
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] :=
maxu,w∈G [S−xy ]
{
[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
![Page 178: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/178.jpg)
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] :=
maxu,w∈G [S−xy ]
{
[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
![Page 179: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/179.jpg)
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] := maxu,w∈G [S−xy ]
{[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
![Page 180: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/180.jpg)
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] := maxu,w∈G [S−xy ]
{[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
![Page 181: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/181.jpg)
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Theorem
Scaffolding can be solved in O(√
2nn3σpσc) time.
17 /33
![Page 182: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/182.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
![Page 183: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/183.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
![Page 184: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/184.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
![Page 185: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/185.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
![Page 186: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/186.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
![Page 187: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/187.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
![Page 188: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/188.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!But is it even NP-hard?
18 /33
![Page 189: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/189.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
![Page 190: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/190.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both? X
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!But is it even NP-hard?
18 /33
![Page 191: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/191.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
![Page 192: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/192.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
![Page 193: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/193.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.
Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?
18 /33
![Page 194: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/194.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?
18 /33
![Page 195: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/195.jpg)
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?18 /33
![Page 196: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/196.jpg)
Sparse Graphs: Quasi-Forest
Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N
Question: is there a satisfying assignment for ϕ of weight ≤ k?
Remark
Independent Set is special case of Weighted 2-SAT
19 /33
![Page 197: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/197.jpg)
Sparse Graphs: Quasi-Forest
Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N
Question: is there a satisfying assignment for ϕ of weight ≤ k?
Remark
Independent Set is special case of Weighted 2-SAT
19 /33
![Page 198: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/198.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 199: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/199.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 200: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/200.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 201: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/201.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 202: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/202.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 203: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/203.jpg)
Sparse Graphs: Quasi-Forest
19 /33
![Page 204: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/204.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 205: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/205.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 206: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/206.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 207: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/207.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 208: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/208.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 209: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/209.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 210: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/210.jpg)
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
![Page 211: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/211.jpg)
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
![Page 212: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/212.jpg)
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
![Page 213: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/213.jpg)
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
![Page 214: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/214.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 215: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/215.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 216: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/216.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 217: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/217.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 218: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/218.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 219: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/219.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
![Page 220: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/220.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Semantics
[d ,P, p, c]i = max. weight of any S with M∩ E (Gi ) ⊆ S ⊆ E (Gi ) and
1. each vertex v ∈ Xi has degree d(v) in Gi [S ],2. for each uv ∈ P , Gi [S ] contains an alternating path. . .
u = v : . . . from u avoiding d−1(1)u 6= v : . . . from u to v
3. Gi [S ] contains p alt. paths & c alt. cycles avoiding d−1(1)
21 /33
![Page 221: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/221.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Leaf Bag
[∅,∅, 0, 0]i = 0
21 /33
![Page 222: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/222.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce v (single child j)[d ,P, p, c]i = [d |v→⊥,P, p, c]j
21 /33
![Page 223: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/223.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v
21 /33
![Page 224: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/224.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v [d ,P, p, c]i = [d |u→1,v→1,P + uv , p, c − 1]j
21 /33
![Page 225: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/225.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v u v u v u v
21 /33
![Page 226: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/226.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
![Page 227: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/227.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
![Page 228: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/228.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)u v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
![Page 229: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/229.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
![Page 230: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/230.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(3tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
![Page 231: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/231.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(3tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
![Page 232: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/232.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(2tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
![Page 233: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/233.jpg)
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(2tw · twtw /2 ·σp · σc) table entries and O((tw +2)tw · σp · σc · n) time
21 /33
![Page 234: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/234.jpg)
Too slow
in prac-
tice!
22/33
![Page 235: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/235.jpg)
Integer Linear Program Formulation
s
tc
t
p- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 236: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/236.jpg)
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths
- bin. variables yuv = 1⇔ u → v usedx{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 237: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/237.jpg)
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ
- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 238: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/238.jpg)
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 239: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/239.jpg)
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 240: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/240.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 241: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/241.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
![Page 242: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/242.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
![Page 243: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/243.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
![Page 244: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/244.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
![Page 245: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/245.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
![Page 246: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/246.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
![Page 247: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/247.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
![Page 248: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/248.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
![Page 249: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/249.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
![Page 250: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/250.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
![Page 251: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/251.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Jump Mechanicsfor each non-contig uv ,
1. introduce a variable zuv
2. construct “jump network” between u and v that fits in the gap
3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps
25/33
![Page 252: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/252.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Jump Mechanicsfor each non-contig uv ,
1. introduce a variable zuv
2. construct “jump network” between u and v that fits in the gap
3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps
25/33
![Page 253: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/253.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
![Page 254: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/254.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
![Page 255: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/255.jpg)
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
![Page 256: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/256.jpg)
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
![Page 257: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/257.jpg)
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
![Page 258: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/258.jpg)
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
![Page 259: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/259.jpg)
ILP Extension: Multiplicities
70
5
47
67
70
135
49
3931
5
49
56
80
80
34
6
11
2489
19
25
83
73
82
71
9
14
63
72
64
28
29
33
66
33
68
94
62
62
11
47
18
67
8
6
74
59
12
34
46
88
42
79
77
13
17
26
24
70
65
45
33
73
67
23
71
108
67
36
13
55
5
38
72
51
71
5
28
73
31
71
59
80
54
84
28
29
71
151
24
5
6
33
68
53
25
×2
28/33
![Page 260: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/260.jpg)
ILP Extension: Multiplicities
70
5
47
67
70
135
49
3931
5
49
56
80
80
34
6
11
2489
19
25
83
73
82
71
9
14
63
72
64
28
29
33
66
33
68
94
62
62
11
47
18
67
8
6
74
59
12
34
46
88
42
79
77
13
17
26
24
70
65
45
33
73
67
23
71
108
67
36
13
55
5
38
72
51
71
5
28
73
31
71
59
80
54
84
28
29
71
151
24
5
6
33
68
53
25
×2
28/33
![Page 261: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/261.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback
29/33
![Page 262: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/262.jpg)
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- int. variables yuv = `⇔ u → v used ` times
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv≥1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv ≤ |C | ·mmax ·
∑u∈C ,v /∈C
yuv
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics !!!Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback
29/33
![Page 263: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/263.jpg)
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
![Page 264: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/264.jpg)
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
![Page 265: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/265.jpg)
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
![Page 266: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/266.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
30/33
![Page 267: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/267.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
3 2 2 1 3 3 5 5 71
2 1 2 1
Proof
30/33
![Page 268: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/268.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
![Page 269: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/269.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
![Page 270: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/270.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
![Page 271: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/271.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
![Page 272: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/272.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
5
2
22
result is collection of alternating paths & cycles
30/33
![Page 273: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/273.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3 3 3
5
2
22
result is collection of alternating paths & cycles
30/33
![Page 274: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/274.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
5
2
22
result is collection of alternating paths & cycles
30/33
![Page 275: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/275.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
5
2
22
result is collection of alternating paths & cycles
30/33
![Page 276: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/276.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
3
22
result is collection of alternating paths & cycles
30/33
![Page 277: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/277.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
3
22
result is collection of alternating paths & cycles
30/33
![Page 278: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/278.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily
missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 279: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/279.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily
missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 280: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/280.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 281: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/281.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 282: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/282.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 283: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/283.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 284: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/284.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 285: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/285.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
![Page 286: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/286.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 287: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/287.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
![Page 288: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/288.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
![Page 289: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/289.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
![Page 290: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/290.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
![Page 291: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/291.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 292: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/292.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 293: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/293.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 294: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/294.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 295: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/295.jpg)
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
![Page 296: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/296.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 297: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/297.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 298: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/298.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 299: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/299.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 300: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/300.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 301: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/301.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 302: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/302.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 303: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/303.jpg)
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
![Page 304: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/304.jpg)
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
![Page 305: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/305.jpg)
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
![Page 306: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/306.jpg)
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
![Page 307: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/307.jpg)
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
![Page 308: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/308.jpg)
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
![Page 309: Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger Sequencing 1.split helix & create thousands of copies 2.add polymerase & floating bases:](https://reader033.fdocuments.in/reader033/viewer/2022060406/5f0f54b47e708231d443a0e5/html5/thumbnails/309.jpg)
Fin!
33/33