Assembling the Glanville fritillary genome Panu Somervuo University of Helsinki MRG group & DNA...
-
Upload
joseph-harvey -
Category
Documents
-
view
223 -
download
2
Transcript of Assembling the Glanville fritillary genome Panu Somervuo University of Helsinki MRG group & DNA...
Assembling the Glanville fritillary genome
Panu Somervuo
University of HelsinkiMRG group & DNA sequencing and genomics lab
CSC Conference 2.6.2010 Next generation sequencing data analysis
Next generation sequencing
• Roche 454• Illumina Solexa• ABI SOLiD
Newbler320Mbp
220K contigsN50: 1700nt
mapping
SOLiD: 40K scaffolds
27M unique
Assembly pipeline
• 454– 10M single reads 400bp
• Illumina Solexa– 52M 2*101 pairend (insertsize 600bp)– 102M 2*76 pairend (insertsize 600bp)– error correction, soap denovo
scaffolds 2M 2*75 matepairs, span 1500 at every 25bp
• SOLiD– 420M 2*50 matepairs (insertsize 1Kbp)filtering 96M
• EST– 26K
Assembly validation 1: contigs vs nr
contig BLASTXhits top5contig00008 216 Bombyx mori (domestic silkworm), Bombyx mori (domestic silkworm), Aedes aegypti (Stegomyia
aegypti), Nasonia vitripennis (jewel wasp), Nasonia vitripennis (jewel wasp)contig00077 2 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid)contig00084 63 Apis mellifera (honey bee), Forficula auricularia (European earwig), Forficula auricularia
(European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig)contig00094 2 Tribolium castaneum (red flour beetle), Apis mellifera (honey bee)contig00198 203 Tribolium castaneum (red flour beetle), Tribolium castaneum (red flour beetle), Nasonia
vitripennis (jewel wasp), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee)contig00208 68 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid),
Tribolium castaneum (red flour beetle), Strongylocentrotus purpuratuscontig00216 163 Pediculus humanus corporis (human body louse), Culex quinquefasciatus (southern house mosquito),
Aedes aegypti (Stegomyia aegypti), Culex quinquefasciatus (southern house mosquito), Tribolium castaneum (red flour beetle)contig00229 39 Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito),
Pediculus humanus corporis (human body louse), Apis mellifera (honey bee), Drosophila pseudoobscura pseudoobscuracontig00251 76 Acyrthosiphon pisum (pea aphid), Pediculus humanus corporis (human body louse), Nematostella
vectensis (starlet sea anemone), Strongylocentrotus purpuratus, Strongylocentrotus purpuratuscontig00278 90 Aedes aegypti (Stegomyia aegypti), Anopheles gambiae str. PEST, Nasonia vitripennis (jewel wasp),
Drosophila willistoni, Drosophila viriliscontig00279 43 Bombyx mori (domestic silkworm), Culex quinquefasciatus (southern house mosquito), Culex
quinquefasciatus (southern house mosquito), Anopheles gambiae str. PEST, Tribolium castaneum (red flour beetle)contig00302 250 Acyrthosiphon pisum (pea aphid), Salmo salar (Atlantic salmon), Branchiostoma floridae (Florida
lancelet), Ciona intestinalis, Ciona intestinaliscontig00310 26 Tribolium castaneum (red flour beetle), Acyrthosiphon pisum (pea aphid), Nasonia vitripennis
(jewel wasp), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti)contig00321 218 Acyrthosiphon pisum (pea aphid), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti),
Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito)contig00471 91 Drosophila virilis, Drosophila mojavensis, Drosophila ananassae, Drosophila yakuba, Drosophila grimshawicontig00507 3 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer),
Ostrinia nubilalis (European corn borer)contig00525 250 Bombyx mori (domestic silkworm), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti),
Apis mellifera (honey bee), Apis mellifera (honey bee)contig00533 8 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis
(European corn borer), Bombyx mori (domestic silkworm), Strongylocentrotus purpuratus
52 13
Assembly validation 2: Genomic contigs vs EST contigs
rev_contig310 1 --TTCAGAGAAACAAGTGAATTGAAATTTGATTATTTAtTTTCGTTTCAG 48 |||||||||||||||.|||||||||||||||||||||||||||||.||contig402106 1 TTTTCAGAGAAACAAGTAAATTGAAATTTGATTATTTATTTtCGTTTTAG 50
rev_contig310 49 TATGAAGCAGCAGCGAGAGGTGCAGAAGCACTTGGAAACAGATATGGTAC 98 |||||||||||.||||||||||||||||||||||||||.|||||||||||contig402106 51 TATGAAGCAGCCGCGAGAGGTGCAGAAGCACTTGGAAAAAGATATGGTAC 100
rev_contig310 99 AAAtTATAGAGTAGGAGtTGCCGCAGATATTCtTTGTAAGtTGTTTTTTT 148 ||||||||||||||||||||||||||||||||||||||||||||||||||contig402106 101 AAATTATAGAGTAGGAGTTGCCGCAGATATTCTTTGTAAGTTGTTTTTTT 150
rev_contig310 149 AATCAGTTTAGCtTGCAGCtTTAAGACTATTATTATATATTTTTTTATCG 198 ||||.|||||.||||||||||||||||||||||||||| |||||||||||contig402106 151 AATCGGTTTATCTTGCAGCTTTAAGACTATTATTATAT-TTTTTTtATCG 199
rev_contig310 199 TTGTACAGTAAGAAGCTACATAAtTTTTcCTACCGcCTA--TT-----gg 241 ||||||||||||||||||||||||||||||||||||||| || .|contig402106 200 TTGTACAGTAAGAAGCTACATAATTTTTCCTACCGCCTATTTTGGGGGAG 249
rev_contig310 242 GGGGGGGGATTGTTGAATCAGTTAAGAATTAAAAGATGATGCTAtTTCAG 291 ||||||||||||||.|||||||.||||||| |||||||||||||||||||contig402106 250 GGGGGGGgATTGTTAAATCAGTCAAGAATT-AAAGATGATGCTATTTCAG 298
rev_contig310 292 aATACtTaAACttTTTTTAAGAC--GAC---------T-A-TAA-GTTTA 327 ||.||||.||||||||||||||| ||| | | ||. |||||contig402106 299 AAAACTTCAACTTTTTTtAAGACTAGACTATTTTTAATAATTAGTGTTTA 348
rev_contig310 328 AATAACACTAATTATTaAAAACTTGGTCTATCTTGGTCTTGGtTTTAGGt 377 |||||||||||||||||||||||||.||||||||.||||||||.|.||||contig402106 349 AATAACACTAATTATTAAAAACTTGATCTATCTTCGTCTTGGTCTAAGGT 398
rev_contig310 378 TTTTCCTCTAGTTAATATTACTGTTACAACTACATAAAAACAATAAAATA 427 ||.|||||||||||||.|||||||||||||||||||||||||||||..||contig402106 399 TTGTCCTCTAGTTAATCTTACTGTTACAACTACATAAAAACAaTAAGGTA 448
rev_contig310 428 CTGTATCTTTGCAGATCCTATGAGCGGAACCACTTTTGACTGGGCGAAGA 477 |||||||||||.||||||||||||||||||||||||||||||||||||||contig402106 449 CTGTATCTTTGTAGATCCTATGAGCGGAACCACTTTtGACTGGGCGAAGA 498
478 ATACAACAAATGTCCCATTTTCTTACCTGATTGAATTAAGAGACTTGGGG 527 ||.|||||||||||||||||||||||||||||||||||||||||||||||499 ATGCAACAAATGTCCcATTTtCTTACCTGATTGAATTAAGAGACTtGGGg 548
528 CAATACGGTTTCTTGTTACCAGCAGAACAGATTATTCCAACTAATTTAGA 577 |||||||||||||||||||||||||||||||||||.||||||||||||||549 CAaTACGGTTtCTTGTTACcAGCAGAACAGATTATACCAACTAATTtAGA 598
578 AATAATGGATGCACTCCTGGAGATGGATAATACCGCAAGAACACTAgGG 626 ||||||||||||||||||||||||||||||.|||||||||||||||||.599 AATAaTGGATGCACTCcTGGAGATGGATAACACCGCAAGAACACTAGGA 647
What now? Still more sequencing needed...
• target enrichment: 55K 120nt probes
• 5’ SAGE• longer matepairs longer contigs & scaffolds
? ?
?
?
? ??
annotation
Challenges
• no elegant solution for combining SOLiD colorspace reads with other platforms in denovo assembly
• read quality: filtering vs error correction• difficulties generating long matepairs• how to finish the assembly project: validation
Goal: to get contigs/scaffolds useful for gene prediction
What is the best assembler?
• soap, velvet, Newbler, CLC bio, Celera
• #contigs, contig lengths, accuracy
Assembling Solexa data
number of contigs sum of contig lengths
contig size contig size
52M 2*101 pairend (insertsize 600bp) 102M 2*76 pairend (insertsize 600bp) error correction (soap denovo)
Assembling 454 data, 10M single reads 400bpnumber of contigs sum of contig lengths
contig size
Newbler: all 454 data + 2M 1500nt matepairs from soap scaffoldsCLC bio: all 454 data + all Solexa data
contig size
- read errors- repetitive elements
denovo
assembler
history:
Part I
de Bruijn graph
denovo
assembler
history:
Part I
I