Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
-
Upload
rudolph-nicholson -
Category
Documents
-
view
218 -
download
0
Transcript of Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
![Page 1: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/1.jpg)
Large Plant Genome Assemblies using Phusion2
Zemin NingZemin NingThe Wellcome Trust Sanger InstituteThe Wellcome Trust Sanger Institute
![Page 2: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/2.jpg)
Phusion2 Assembly Pipeline
NGS Data Assembly
Contig Contig MergeMerge
FilteringFilteringUnikalow
ClusteringClusteringPhusion2
Contig Contig GenerationGeneration
ScaffoldingScaffoldingSpinner
Consensus BasesConsensus BasesSmalt & Gap5
SOAPdenovo
Fermi
ABySS
Mate Pair Reads2k-40k
Pair End Reads170-800bp
![Page 3: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/3.jpg)
ftp://ftp.sanger.ac.uk/pub/badger/aw7/icas_v061.tar.bz2
iCAS – an Illumina Clone Assembly System
![Page 4: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/4.jpg)
Unikalow: ftp://ftp.sanger.ac.uk/pub/zn1/unikalow/
Data filtering using Unikalow
![Page 5: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/5.jpg)
Assembly Method
1 A C C T G A T C
2 C T G A T C A A
3 T G A T C A A T
4 A G C G A T C A
5 C G A T C A A T
6 G A T C A A T G
7 T C A A T G T G
8 C A A T G T G A
1. Overlap graphSequencing reads:
2. de Bruijn graph
3. String graph
![Page 6: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/6.jpg)
Scaffold Merge: Scaffold Merge:
RefRef
Contig Merge: Contig Merge:
BaseBase
SupSup
RefRef
BaseBase
CtgCtg
ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/
![Page 7: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/7.jpg)
Contig Consensus using Gap5 Contig Consensus using Gap5
![Page 8: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/8.jpg)
![Page 9: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/9.jpg)
PacBioPacBio
CapillaryCapillary
IlluminaIllumina
Can we really trust Single Molecule Sequencing?Can we really trust Single Molecule Sequencing?
![Page 10: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/10.jpg)
Clone Length SOAP ABySS iCAS
N50* Sub|Ind N50* Sub|Ind N50* Sub|Ind Uncov
bE217O4 186945 59863 11|10 109235 0|2 109235 0|2 (2)** 12
bT237K12 130462 13717 57|32 23386 8|4 47205 8|4 (19)** 626
bE352A13 153875 31247 41|23 93010 8|15 132592 8|14 (65)** 23
bE367M14 154288 105083 40|9 31405 1|1 107394 0|1 (20)** 1487
bE378K21 207850 173047 11|10 54240 23|5 187396 0|1 (10)** 741
fSS328I2 42036 42087 3|5 12628 1|0 42047 0|0 0
fSS404B14 32829 19543 0|3 29098 3|1 32832 0|0 0
fSY5K10 41286 41352 0|3 41296 0|0 41296 0|0 0
Clone Assemblies vs Assemblers 5 BAC clones and 3 fosmids
Clone coverage: 99.7%; Base quality: Q39
![Page 11: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/11.jpg)
Spinner – a scaffolding tool
Spinner uses mate pair data to scaffold contigs. Contigs, and pairs of contigs connected by pairs, define a bi-directional graph:
Using expected insert size, a estimate of the gap size can be given for each contig.
ftp://ftp.sanger.ac.uk/pub/users/zn1/spinner/
![Page 12: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/12.jpg)
Spinner – walks through a loopThese techniques alone produces useful results.Further stages will be used to resolve repeats pairs that “jump over” repeats, and graph flow concepts.
![Page 13: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/13.jpg)
_________________________________________________________ SSPACE SPINNER_________________________________________________________
Genome_Size N50 Average N50 Average
Assemblathon 1 119 Mb 608Kb 86.8Kb 11Mb 450Kb
Grass Carp (F) 900Mb 2.3Mb 14.4 5.85Mb 17.1Kb
Grass Carp (M) 1000MB 0.34Mb 11.2Kb 2.27 Mb 8.2Kb
Bamboo 2.0 Gb 322Kb 7404 488Kb 7689
Parrot 1.23 Gb 906Kb 4675 1.32Mb 6969 ________________________________________________________
Spinner vs SSPACESpinner vs SSPACE
![Page 14: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/14.jpg)
Grass Phylogeny
![Page 15: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/15.jpg)
Gs = (Kn – Ks)/D = 1.97x109
Kn = 80.5x109 – Total number of kmer words;Ks = 9.5x109 - Number of single copy kmer words;D = 36 - Depth of kmer occurrence
Bamboo Genome: Size EstimationBamboo Genome: Size Estimation
![Page 16: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/16.jpg)
Solexa reads:Number of read pairs: 877 Million;Finished genome size: 2.0 GB;Read length: 2x100bp;Estimated read coverage: ~90X;Insert size: 500/50-600 bp;Mate pair data: 3k,5k,7k,8k,10k,20kNumber of reads clustered: 757 Million
Assembly features: - statsContigs Scaffolds
Total number of contigs: 744,286 277,278Total bases of contigs: 1.86 Gb 2.05 GbN50 contig size: 11,622 328,698Largest contig: 188,163 4,869,017 Averaged contig size: 2,500 7,400Contig coverage on genome: ~90% >95%
Bamboo Genome Bamboo Genome AssemblyAssembly
![Page 17: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/17.jpg)
Assemblies by pure
SOAPdenovo
Assemblies by SOAPdenovo &
Abyss
Rate of single-base difference (# per Kb) 2.28 0.43
Rate of insertion and deletion (# per Kb) 0.82 0.19
Coverage by initial contigs 0.76 0.85
Coverage by supercontigs 0.91 0.94
Bamboo Genome Assembly Bamboo Genome Assembly QC using Finished BACsQC using Finished BACs
![Page 18: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/18.jpg)
![Page 19: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/19.jpg)
![Page 20: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/20.jpg)
Evolution of the Wheat Genome
![Page 21: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/21.jpg)
Size of the Wheat Genome: 17Gb
![Page 22: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/22.jpg)
International Wheat Genome Sequencing Consortium
![Page 23: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/23.jpg)
WHEjyyDADDBAAPE 167WHEjjzDADDCBAPE 199WHEjjzDADDCCAPE 223WHEjjzDADDCABPE 230WHEjyyDAEDDAAPE 250WHEjyyDAEDDABPE 250WHEjyyDAEDDBAPE 250WHEjyyDAEDDBBPE 250WHEjyyDAEDDCAPE 250WHEjyyDAEDDCBPE 250WHEjyyDAEDDDAPE 250WHEjjzDADDCACPE 254WHEjyyDAEDIAAPE 500WHEjyyDAEDIBAPE 500WHEjyyDADDIAAPE 502WHEjyyDADDIDAPE 510WHEjyyDADDICAPE 527WHEjyyDADDIBAPE 532WHEjyyDADDIBBPE 551WHEjyyDADDKAAPE 682WHEjyyDADDMBAPE 706WHEjyyDADDKCAPE 725WHEjyyDADDMAAPE 764
WHEjyyDAADWAAPE 2000WHEjyyDAADWBAPE 2000WHEjyyDAADWCAPE 2000WHEjyyDAADWDAPE 2000WHEjyyDACDWAAPE 2002WHEjyyDAEDWAAPE 2008WHEjyyDACDWBBPE 2500WHEjyyDAADLAAPE 5000WHEjyyDAADLBAPE 5000WHEjyyDAADLBBPE 5000WHEjyyDAEDLAAPE 5004WHEjjzDADLBBPE 8300WHEjyyDAADTAAPE 10000WHEjyyDABDTAAPE 10000WHEjyyDADDTAAPE 10000WHEjyyDADDTBBPE 10000WHEjyyDAIDUAAPE 20000
Sequencing of D GenomeSequencing of D GenomeLibraries & Insert SizesLibraries & Insert Sizes
![Page 24: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/24.jpg)
Gs = (Kn – Ks)/D = 4.2x109
Kn = 59.8x109 – Total number of kmer words;Ks = 4.3x109 - Number of single copy kmer words;D = 13 - Depth of kmer occurrence
D Genome: Size EstimationD Genome: Size Estimation
![Page 25: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/25.jpg)
Solexa reads:Number of read pairs: 805 Million;Estimated genome size: 4.2 GB;Read length: 45-95bp;Estimated read coverage: ~40X;Insert size: 167-800 bp;Mate pair data: 2k - 20kNumber of reads clustered: 558 Million
Assembly features: - statsContigs
Total number of contigs: 3,228,623Total bases of contigs: 3.34 GbN50 contig size: 3,084Largest contig: 86,064Averaged contig size: 1,035Contig coverage on genome: ~80%
Wheat D Genome Wheat D Genome AssemblyAssembly
![Page 26: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/26.jpg)
55,277 130,221 0.88 Gb 0.97Gb40,353 18,2525.89 Mb 2.27Mb
Grass carp(F&M)Grass carp(F&M)
MiscanthusMiscanthus Wild riceWild rice
![Page 27: Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649cc55503460f9498e8f3/html5/thumbnails/27.jpg)
Acknowledgements: Joe Henson German Tischler Andrew Whitwham
Chinese Academy of Agricultural Sciences
Jizeng Jia
Guangyue Zhao National Gene Research Centre,
Chinese Academy of Sciences
Han Bin
Hengyun Lu