Aug2015 analysis team spiral genetics

31
ANCHORED ASSEMBLY Accurate Structural Variant Detection Using Short-Read Data AA Bruestle, J.J. and Shekar, S.N.

Transcript of Aug2015 analysis team spiral genetics

Page 1: Aug2015 analysis team spiral genetics

ANCHORED ASSEMBLYAccurate Structural Variant Detection Using Short-Read Data

AA

Bruestle, J.J. and Shekar, S.N.

Page 2: Aug2015 analysis team spiral genetics

Methodology

Anchoring

AA

Anchor Assemblies

7 7 7

7

8 89 9

7

8

7

R1 R2

R3 R5

R8R7

R3 R6 R9

Read overlapassembly

Read Overlap Assembly

Remove Reference ReadsRead Correction

0

0 200 400 600 800 1000 1200

1000

2000

3000

4000

5000

K-m

er C

ount

Total K-mer Quality Score

K-mer Quality Score Distribution

A* error correction

Page 3: Aug2015 analysis team spiral genetics

SV Comparison

Baylor College of Medicine, against Illumina, PacBio, Array, Nextera and BioNano

Program FDR Sensitivity

CNVnator 80.46% 22.62%

BreakDancer 58.89% 42.39%

Delly 55.13% 31.18%

Crest 14.87% 35.29%

Pindel 31.81% 56.70%

SVStat 1.79% 16.36%

Tiresias 69.04% 7.79%

Spiral 3.03% 42%

English et al. (2015), updated

AA

Page 4: Aug2015 analysis team spiral genetics

Fosmid/PacBio validated SVsAA

Validated in collaboration by Malig, M, Eichler, EE et al.

Selected 15 high confidence SVs not previously detected in the 1000 Genomes Project

Page 5: Aug2015 analysis team spiral genetics

PacBio validated SVs deleteAA

Chr Call  Size  (bp) Clones  seq  with  PacBio

Validated  by  Micropeats

Validated  by  Dotplots

Call  validated?

1 1026 2 2 2 yes

1 6375 2 2 2 yes

2 26838 2 2 2 yes

3 4184 1 1 1 yes

5 9507 2 2 2 yes

7 3013 1 1 1 yes

8 5157 2 1 1 yes

9 2883 1 1 1 yes

15 6051 2 2 2 yes

Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)

Page 6: Aug2015 analysis team spiral genetics

PacBio validated SVs InsertsAA

Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)

Chr Call  Size  (bp) Clones  seq  with  PacBio

Validated  by  Micropeats

Validated  by  Dotplots

Call  validated?

1 1755 2 2 2 yes

1 3865 2 2 2 yes

8 2457 2 2 2 yes

8 1508 2 2 2 yes

13 2142 2 2 2 yes

X 1548 2 2 2 yes

Page 7: Aug2015 analysis team spiral genetics

PacBio validation dotplotsAA

Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)

Chromosome  1  

3.8kb  insertion

Page 8: Aug2015 analysis team spiral genetics

Chromosome  1  

6.4kb  deletion

PacBio validation dotplotsAA

Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)

Page 9: Aug2015 analysis team spiral genetics

Chromosome  2  

26.8kb  deletion

PacBio validation dotplotsAA

Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)

Page 10: Aug2015 analysis team spiral genetics

Ashkenazi Jewish Trio AA

Validated by Noah Spies using his program SV Viz

Chr2 Deletion Chr8 Insertion

Page 11: Aug2015 analysis team spiral genetics

Reference

Chr 2 Deletion - FatherAA

Alternative

Page 12: Aug2015 analysis team spiral genetics

Chr 2 Deletion - MotherAA

Alternative

Reference

Page 13: Aug2015 analysis team spiral genetics

Chr 2 Deletion - OffspringAA

Alternative

Reference

Page 14: Aug2015 analysis team spiral genetics

Chr 2 Deletion - VCF IdenticalAA

HG002 - Offspring chr2 34695829 T <DEL> 100 PASS NS=1;DP=51;SVTYPE=DEL;END=34736567;SVLEN=-40730 DP:AD 51:21,30

HG003 - Father chr2 34695829 T <DEL> 100 PASS NS=1;DP=55;SVTYPE=DEL;END=34736567;SVLEN=-40730; DP:AD 55:0,55

Page 15: Aug2015 analysis team spiral genetics

Chr 8 Insertion - FatherAA

Alternative

Reference

Page 16: Aug2015 analysis team spiral genetics

Chr 8 Insertion - MotherAA

Alternative

Reference

Page 17: Aug2015 analysis team spiral genetics

Chr 8 Insertion - OffspringAA

Alternative

Reference

Page 18: Aug2015 analysis team spiral genetics

HG0004

chr8 129739066 AATAAA 100 masking_present NS=1;DP=41;SVTYPE=INS; END=129739071;SVLEN=3404; DP:AD 41:28,13

Chr 2 Insertion - VCF IdenticalAA

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT

Mother

Page 19: Aug2015 analysis team spiral genetics

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT

Chr 2 Insertion - VCF IdenticalAA

FatherHG003

chr8 129739066 AATAAA 100 masking_present NS=1;DP=47;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 47:32,15

Page 20: Aug2015 analysis team spiral genetics

Chr 2 Insertion - VCF IdenticalAA

HG002

chr8 129739066 AATAAA 100 masking_present NS=1;DP=18;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 18:0,18

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT

Offspring

Page 21: Aug2015 analysis team spiral genetics

Chr 2 Insertion - VCF IdenticalAA

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT

Overlay - SNPs from both parents are present in the Offspring

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT

GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCTGAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGTTTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGCTTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTGAGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAATAGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGCTATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACATCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATGCTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGCATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGTATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAGAGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAACTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGCAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTCTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTACTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACTTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAGATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTGTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCCTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCAAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAGCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTGACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTGATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGGGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTACCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT

A

Page 22: Aug2015 analysis team spiral genetics

What is GraphBWT?

Page 23: Aug2015 analysis team spiral genetics

Core Technology to Make Anchored Assembly Feasible

• Needed a way to represent the read data that was graph based

• Fast search for variation from reference directly from the reads in a whole genome dataset

• Small enough footprint to store a read overlap graph of whole human genome in memory

23

Page 24: Aug2015 analysis team spiral genetics

GraphBWT

• Technology for storing all of the reads that comprise the variation graph of a whole human genome

• Very compact to fit into memory (1.5 bytes per base)

• In memory, allows for extremely fast searches via subsequence

24

Page 25: Aug2015 analysis team spiral genetics

Resulting technologies that use GraphBWT

Page 26: Aug2015 analysis team spiral genetics

SpEC SV and Query

• SpEC: A lossless compression format that reduces BAM files to 50% of their original size and that can be analyzed with existing bioinformatics tools while compressed

• SpEC SV: SpEC that also includes a compact sequence index, known as a GraphBWT (3GB), which is a graph based representation of genomic variation

• SpEC Query: an API that reads SpEC SV files to enable rapid queries of sequence data via location or by a subsequence

26

Page 27: Aug2015 analysis team spiral genetics

27

Create a SpEC SV File

Spiral’s SpEC SV File

Page 28: Aug2015 analysis team spiral genetics

Query Times

28

Samples,  Variant  calls SpEC  Query  using  SpEC SpEC  Query  using  SpEC  SV

1  sample,  1  variant Milliseconds Milliseconds

1  sample,  1M  variants 10  Minutes 5  Minutes

1000  samples,  1  variant 10-­‐20  Minutes 10-­‐20  Minutes

1000  samples,  1M  variants 4  Days 2  Days

Variant  types SNPs  and  Indels SNPs,  Indels  and  SVs

Page 29: Aug2015 analysis team spiral genetics

GraphBWT Technical Details

• Constant time traversal of k-mer graph for any sized k-mer

• Subsequence search linear with size of sequence

• Storage requirements grow linearly with size of novel sequence (i.e. variation)

29

Page 30: Aug2015 analysis team spiral genetics

Use Cases for SpEC SV

• Search for evidence of variation in read data

• Compare graphs between individuals for unique variation

• Compare combined graphs of two groups

• Store variation, for example a reference genome

30

Page 31: Aug2015 analysis team spiral genetics

Questions?

Niranjan Shekar - VP of Bioinformatics [email protected]