Aug2015 analysis team spiral genetics
-
Upload
genomeinabottle -
Category
Health & Medicine
-
view
605 -
download
5
Transcript of Aug2015 analysis team spiral genetics
ANCHORED ASSEMBLYAccurate Structural Variant Detection Using Short-Read Data
AA
Bruestle, J.J. and Shekar, S.N.
Methodology
Anchoring
AA
Anchor Assemblies
7 7 7
7
8 89 9
7
8
7
R1 R2
R3 R5
R8R7
R3 R6 R9
Read overlapassembly
Read Overlap Assembly
Remove Reference ReadsRead Correction
0
0 200 400 600 800 1000 1200
1000
2000
3000
4000
5000
K-m
er C
ount
Total K-mer Quality Score
K-mer Quality Score Distribution
A* error correction
SV Comparison
Baylor College of Medicine, against Illumina, PacBio, Array, Nextera and BioNano
Program FDR Sensitivity
CNVnator 80.46% 22.62%
BreakDancer 58.89% 42.39%
Delly 55.13% 31.18%
Crest 14.87% 35.29%
Pindel 31.81% 56.70%
SVStat 1.79% 16.36%
Tiresias 69.04% 7.79%
Spiral 3.03% 42%
English et al. (2015), updated
AA
Fosmid/PacBio validated SVsAA
Validated in collaboration by Malig, M, Eichler, EE et al.
Selected 15 high confidence SVs not previously detected in the 1000 Genomes Project
PacBio validated SVs deleteAA
Chr Call Size (bp) Clones seq with PacBio
Validated by Micropeats
Validated by Dotplots
Call validated?
1 1026 2 2 2 yes
1 6375 2 2 2 yes
2 26838 2 2 2 yes
3 4184 1 1 1 yes
5 9507 2 2 2 yes
7 3013 1 1 1 yes
8 5157 2 1 1 yes
9 2883 1 1 1 yes
15 6051 2 2 2 yes
Malig, M, Eichler, EE et al. (Manuscript in preparation)
PacBio validated SVs InsertsAA
Malig, M, Eichler, EE et al. (Manuscript in preparation)
Chr Call Size (bp) Clones seq with PacBio
Validated by Micropeats
Validated by Dotplots
Call validated?
1 1755 2 2 2 yes
1 3865 2 2 2 yes
8 2457 2 2 2 yes
8 1508 2 2 2 yes
13 2142 2 2 2 yes
X 1548 2 2 2 yes
PacBio validation dotplotsAA
Malig, M, Eichler, EE et al. (Manuscript in preparation)
Chromosome 1
3.8kb insertion
Chromosome 1
6.4kb deletion
PacBio validation dotplotsAA
Malig, M, Eichler, EE et al. (Manuscript in preparation)
Chromosome 2
26.8kb deletion
PacBio validation dotplotsAA
Malig, M, Eichler, EE et al. (Manuscript in preparation)
Ashkenazi Jewish Trio AA
Validated by Noah Spies using his program SV Viz
Chr2 Deletion Chr8 Insertion
Reference
Chr 2 Deletion - FatherAA
Alternative
Chr 2 Deletion - MotherAA
Alternative
Reference
Chr 2 Deletion - OffspringAA
Alternative
Reference
Chr 2 Deletion - VCF IdenticalAA
HG002 - Offspring chr2 34695829 T <DEL> 100 PASS NS=1;DP=51;SVTYPE=DEL;END=34736567;SVLEN=-40730 DP:AD 51:21,30
HG003 - Father chr2 34695829 T <DEL> 100 PASS NS=1;DP=55;SVTYPE=DEL;END=34736567;SVLEN=-40730; DP:AD 55:0,55
Chr 8 Insertion - FatherAA
Alternative
Reference
Chr 8 Insertion - MotherAA
Alternative
Reference
Chr 8 Insertion - OffspringAA
Alternative
Reference
HG0004
chr8 129739066 AATAAA 100 masking_present NS=1;DP=41;SVTYPE=INS; END=129739071;SVLEN=3404; DP:AD 41:28,13
Chr 2 Insertion - VCF IdenticalAA
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT
Mother
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
Chr 2 Insertion - VCF IdenticalAA
FatherHG003
chr8 129739066 AATAAA 100 masking_present NS=1;DP=47;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 47:32,15
Chr 2 Insertion - VCF IdenticalAA
HG002
chr8 129739066 AATAAA 100 masking_present NS=1;DP=18;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 18:0,18
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
Offspring
Chr 2 Insertion - VCF IdenticalAA
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT
Overlay - SNPs from both parents are present in the Offspring
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
A
What is GraphBWT?
Core Technology to Make Anchored Assembly Feasible
• Needed a way to represent the read data that was graph based
• Fast search for variation from reference directly from the reads in a whole genome dataset
• Small enough footprint to store a read overlap graph of whole human genome in memory
23
GraphBWT
• Technology for storing all of the reads that comprise the variation graph of a whole human genome
• Very compact to fit into memory (1.5 bytes per base)
• In memory, allows for extremely fast searches via subsequence
24
Resulting technologies that use GraphBWT
SpEC SV and Query
• SpEC: A lossless compression format that reduces BAM files to 50% of their original size and that can be analyzed with existing bioinformatics tools while compressed
• SpEC SV: SpEC that also includes a compact sequence index, known as a GraphBWT (3GB), which is a graph based representation of genomic variation
• SpEC Query: an API that reads SpEC SV files to enable rapid queries of sequence data via location or by a subsequence
26
27
Create a SpEC SV File
Spiral’s SpEC SV File
Query Times
28
Samples, Variant calls SpEC Query using SpEC SpEC Query using SpEC SV
1 sample, 1 variant Milliseconds Milliseconds
1 sample, 1M variants 10 Minutes 5 Minutes
1000 samples, 1 variant 10-‐20 Minutes 10-‐20 Minutes
1000 samples, 1M variants 4 Days 2 Days
Variant types SNPs and Indels SNPs, Indels and SVs
GraphBWT Technical Details
• Constant time traversal of k-mer graph for any sized k-mer
• Subsequence search linear with size of sequence
• Storage requirements grow linearly with size of novel sequence (i.e. variation)
29
Use Cases for SpEC SV
• Search for evidence of variation in read data
• Compare graphs between individuals for unique variation
• Compare combined graphs of two groups
• Store variation, for example a reference genome
30
Questions?
Niranjan Shekar - VP of Bioinformatics [email protected]