MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Improving and validating the Atlantic Cod genome assembly using PacBio
-
Upload
lex-nederbragt -
Category
Spiritual
-
view
2.757 -
download
0
description
Transcript of Improving and validating the Atlantic Cod genome assembly using PacBio
![Page 1: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/1.jpg)
Improving and validating the Atlantic Cod genome assembly using error-corrected
as well as raw PacBio reads
Lex Nederbragt, NSC and [email protected]
@lexnederbragtOK
![Page 2: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/2.jpg)
Acknowledgements
University of Oslo
Sequencing team NSC
Ole Kristian TøressenKjetill Jakobsen
Sissel JentoftCod genome group
Jason Miller, JCVI
Pacific Biosciences
![Page 3: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/3.jpg)
The Atlantic cod genome project
![Page 4: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/4.jpg)
Cod: the genome
850 million bases (Mbp )Heterozygote
‘Wild-caught’
![Page 5: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/5.jpg)
Cod: phase 1
(Sanger sequencing)454 sequencing
![Page 6: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/6.jpg)
N50
50% of the genome is in contigs as large as the N50 value
Courtesy of Michael Schatz, CSHL
1000 bp genome
445
520
400
490
N50
Sum
![Page 7: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/7.jpg)
Cod: phase 1
(Sanger sequencing)454 sequencing
Phase 1 assembly157 887 sequences753 Mbp of 830 Mbp
Scaffoldcontig
gap
N50 460 kbp
N50 2.8 kbp
![Page 8: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/8.jpg)
Cod: phase 1
6467 scaffolds
35% gap bases
![Page 9: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/9.jpg)
The causes
Short Tandem Repeats (>20% of gaps)
![Page 10: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/10.jpg)
The causes
Polymorphic contig 2Polymorphic contig 2
Polymorphic contig 3Polymorphic contig 3
Contig 4Contig 1
Heterozygosity?
![Page 11: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/11.jpg)
Cod: phase 2
New dataIllumina sequencingPaired end >200xMate Pair 5kb >100x
Improved/new software
![Page 12: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/12.jpg)
23 pseudochromosomes
Below 5% gap bases
Longer contigs
Cod: phase 2 goal
Phase 2 goalScaffold N50 1 MbpContig N50 15 kbp
![Page 13: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/13.jpg)
Cod: phase 2 programs
Zhang et al. PLoSOne 2011
![Page 14: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/14.jpg)
Cod phase 2: status
Goal
Contig scaffold N50 gaps N50
15 kbp <5% 1.5 Mbp
Celera, 454 + Ilmn
Newbler, 454
9 kbp 5% too short
6 kbp 24% OK
![Page 15: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/15.jpg)
Enter PacBio
![Page 16: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/16.jpg)
Large Insert Sizes
Sequencing
Aim for looooong insert sizes
Photo: Tore Oldeide Elgvin
147 SMRT Cells
Chemistry Coverage Av. Raw length
C2 9.2x 3.0 kb
C2-XL 3.2x 4.6 kb
XL-XL 3.5x 5.1 kb
TOTAL 15.9x
![Page 17: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/17.jpg)
Error-correction
Celera Assembler merTrim
+
27x
234x
PacBioToCa (Koren et al)
+
13.7x
27x
9x (67%) recovered
![Page 18: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/18.jpg)
Using PacBio reads
![Page 19: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/19.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement Celera PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 20: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/20.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 21: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/21.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 22: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/22.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 23: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/23.jpg)
Assembly improvement: corrected reads
Celera, 454 reads
Goal
N50 gaps
15 kbp <5%
9 kbp 5%
+ corrected PacBio + PBJelly 11 kbp 1.5%
![Page 24: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/24.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 25: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/25.jpg)
Assembly improvement: raw reads
Goal
N50 gaps
15 kbp <5%
6 kbp 24%Newbler, 454
+ raw PacBio + PBJelly30 kbp 20%
![Page 26: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/26.jpg)
Assembly improvement: raw reads
Goal
N50 gaps
15 kbp <5%
9 kbp 5%
Too good to be true?
Celera, 454 + Ilmn
+ raw PacBio + PBJelly
46 kbp 1.5%
![Page 27: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/27.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 28: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/28.jpg)
Assembly validation
Sequence
![Page 29: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/29.jpg)
Assembly validation
Sequence
Aligned raw Pacbio reads
Coverage
![Page 30: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/30.jpg)
Assembly validation
Sequence
Aligned raw Pacbio reads
Coverage
Aligned corrected Pacbio reads
![Page 31: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/31.jpg)
Assembly validationRa
wpa
cbio
read
sCo
rrec
ted
pacb
io re
ads
(TG)n repeat (TG)n repeat
308 bp gap
Newbler scaffold
![Page 32: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/32.jpg)
Assembly validationRa
wpa
cbio
read
s
(AG)n repeat
939 bp gap
Newbler scaffold
Heterozygous region
![Page 33: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/33.jpg)
Assembly validationRa
wpa
cbio
read
s
Celera scaffold
Misassembly?
![Page 34: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/34.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 35: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/35.jpg)
Assembly validation: bridgemapper (beta)
structural variation misassemblies
Split alignments
![Page 36: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/36.jpg)
bridgemapper (beta) on E. coli
Positions in the contig color coded Illumina + velvet
![Page 37: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/37.jpg)
s05514
bridgemapper (beta) on cod
2510 bp gap
Point to a 2350 bp scaffold
![Page 38: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/38.jpg)
s08737
bridgemapper (beta) on cod
2145 bp gap
Point to a 3 kbp scaffold
![Page 39: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/39.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 40: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/40.jpg)
Assembly with error-corrected reads
Celera Assembly
Goal
Contig N50 gaps scaffolds
15 kbp <5%
9 kbp 5% too short
1.4 times genome size underassembled
CA + corrected PacBio + 454 mates 8 kbp 2% very short
![Page 41: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/41.jpg)
The improved Atlantic cod genome: status
http://en.wikipedia.org
![Page 42: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/42.jpg)
Newbler plus Celera
Scaffoldcontig
gap
Celera: Long contigs, short scaffolds
Slide courtesy of Ole Kristian Tøressen
![Page 43: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/43.jpg)
Newbler plus Celera
Scaffoldcontig
gap
Scaffoldcontig
gap
Celera: Long contigs, short scaffolds
Newbler: Short contigs, long scaffolds
Slide courtesy of Ole Kristian Tøressen
![Page 44: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/44.jpg)
Newbler plus Celera
Scaffoldcontig
gap
Scaffoldcontig
gap
Celera: Long contigs, short scaffolds
Newbler: Short contigs, long scaffolds
Scaffoldcontig
gapCombined: Long contigs, long scaffolds
Slide courtesy of Ole Kristian Tøressen
![Page 45: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/45.jpg)
Contig
Scaffold
PacBio reads
Slide courtesy of Ole Kristian Tøressen
Adding PacBio
Closed gap Reduced gap
Using PBJelly
![Page 46: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/46.jpg)
Polishing the assembly
454 and Illumina reads
Slide courtesy of Ole Kristian Tøressen
Contig
Scaffold
Contig N50: 30 - 40 kbpScaffold N50: 1 - 1.5 Mbp
![Page 47: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/47.jpg)
Imageby Mathieu Thouvenin http://www.flickr.com/photos/mathoov/4681491052/
![Page 48: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/48.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera
![Page 49: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/49.jpg)
PacBio reads for cod
Error-correctedreads
Rawreads
Assembly improvement PBJelly PBJelly
Assembly validation blasr blasrbridgemapper
De novo assembly Celera Celera
![Page 50: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/50.jpg)
Assembly
Goal
Contig N50 gaps scaffolds
15 kbp <5%
8 kbp 2% very short CA + corrected PacBio + 454 mates
1.6 times genome size underassembled
CA + raw PacBio reads + 454 mates 38 kbp <1% very short
![Page 51: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/51.jpg)
Lessons learned from PacBio reads
![Page 52: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/52.jpg)
Heterozygous:Large polymorphism
(100’s of bases)
Heterozygous:Large indel
(100’s of bases)
Homozygous HomozygousHomozygous
Cod genome
![Page 53: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/53.jpg)
Atlantic cod version 2
23 pseudochromosomes
Below 5% gap bases
Longer contigs
New annotation
![Page 54: Improving and validating the Atlantic Cod genome assembly using PacBio](https://reader036.fdocuments.in/reader036/viewer/2022081516/554e83c4b4c905f66a8b5685/html5/thumbnails/54.jpg)
From observation to insight
Mathias Bigge, Ricordisamoa, others (wikimedia commons)
We need better programs