Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
-
Upload
erick-morton -
Category
Documents
-
view
233 -
download
0
description
Transcript of Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
Cross_genome: Assembly Scaffolding using Cross-species
SyntenyZemin NingZemin Ning
High Performance Assembly High Performance Assembly
Target sequence
Reference
Scaffold 1
Scaffold 2
Scaffold 3
Q = scaff(i)*2Q = scaff(i)*23232 + contig_loci(j) + contig_loci(j)
Lattice of Target -Reference
Target sequence
Reference
Scaffold 1
After Noise Cleaning
Y
X
Gap_size = Y - X Gap_size = Y - X
Scaffold 2
Scaffold 3
Cases Shouldn’t JoinCases Shouldn’t Join
ReferenceReference
TargetTargetScaffold 1 Scaffold 2
Scaffold 2Scaffold 1Gap_size Gap_size
ReferenceReference
TargetTarget
Assembler N_bases N_scaffs N50 (Mb)Original 88.8 418 81.6
Allpahts-LG RACA 86.8Cross_genome 89 221 85.5Original 78.6 1472 0.37
Bambus2 RACA 72.1Cross_genome 78.6 1094 13.7Original 86.5 498 0.4
CABOG RACA 81.4Cross_genome 86.3 46 85.5Original 89.7 1094 0.88
MSR-CA RACA 83.4Cross_genome 89.6 13.7Original 94.7 30975 0.075
SGA RACA 57.4Cross_genome 94.8 29662 77.3Original 108 38477 0.453
SOAPdenovo RACA 84.4Cross_genome 102.8 12955 78.9Original 143.8 61455 0.84
Velvet RACA 123Cross_genome 139.4 3278 8.71
GAGE: Human Chr14 and RACA using Orangutan GAGE: Human Chr14 and RACA using Orangutan
Original Cross_g References
Panda 1.3Mb 25Mb Dog, Human
Tibetan Antelope 2.6Mb 42Mb Cattle, Dog, Human
Tasmanian Devil 1.8Mb 6.8Mb Opossum
Scaffold N50 for Other Genome Assemblies Scaffold N50 for Other Genome Assemblies
Availability Availability
ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/
Improve gorilla assembly using human reference
Contig Merge/Break
Variation correction
Contig gap size re-estimation
Read AlignmentPair-wise/Multiple
Combined Gorilla-Human Assembly
Human Reference
Gorilla Assembly
Final Gorilla Assembly
Gap size
New gap size
Target sequence
Reference sequence
Re-estimate Contig Gap Sizes from Reference Re-estimate Contig Gap Sizes from Reference
New gap size
Read alignment and variation correction
Ref seq inserted
Contig Consensus using Gap5 Contig Consensus using Gap5 Target (query) aligned against ReferenceTarget (query) aligned against Reference
Before
Target (query) aligned against ReferenceTarget (query) aligned against Reference
Reference Sequence Replacement &VariationCorrection
Original Contig (query) against New Original Contig (query) against New Assembly after Contig BreakAssembly after Contig Break
Original Contig (query) against New Original Contig (query) against New Assembly after Contig BreakAssembly after Contig Break
Original New
Total number of contigs: 464,875 285,139
N50 contig size: 11.7kb 23.9kb
Largest contig: 191,556 322,733
Averaged contig size: 6085 9928
The Gorilla AssembliesThe Gorilla Assemblies