SVs hackathon group report

8
SVs Hackathon group

Transcript of SVs hackathon group report

Page 1: SVs hackathon group report

SVs Hackathon group

Page 2: SVs hackathon group report

10x genomics● Library preparation system for

illumina reads

● WGS, single cell, Exon Seq, RNA-Seq,

● Assigns barcodes to pairs that belong to the same molecule

● Software: ○ Longranger (mapping + SNPs)○ Supernova (Assembly)

● Rising star in the field -> More (specialized) software will be required

Page 3: SVs hackathon group report

1. Topsorter:Graph based assessment of SVs using 10x genomics

1.1 Get data (GIAB + NA12878) (done)

1.2 Call SVs (done)

1.3 Retrieve barcode information (done)

1.4 Construct a weighted DAG w/ chr + SVs (done)

1.5 Update the weights with barcode density and additional information (in progress)

1.6 Use topological sorting to find the longest graph and see if SVs exists (done)

1.7 F1000 Paper (in progress)

Chromosome 20

Page 4: SVs hackathon group report

DEL REFREF

DUP REFREF

DUP_COPY

INV (S,E)

REFREF

INV_FLIP(E,S)

238 377

22

Construct a DAG with SV orientation

Update the graph weights of barcode densities

Find the longest path in a graph

Mostly likely haplotype for each chromosome

Page 5: SVs hackathon group report

2. GLRSim/Mr. Fantastic 10x genomics read simulator

2.1 Agree on list of parameters (done)

2.2 Obtain real data (13 data sets) (done)

2.3 Obtain further insights based on real data (done)

2.3 Construct the pipeline (done)

2.4 Compare to real data (todo)

2.5 Write up as Bioinformatics applications note (in progress)

Page 6: SVs hackathon group report
Page 7: SVs hackathon group report

3. DangerTrackDifficult to assess regions

3.1 Collect SVs (1k + GIAB ) (done)

3.2 Collect repetitive regions (done)

3.3 Collect GC levels (done)

3.4 Collect breakpoint positions with e.g. 5kb (done)

3.5 Test pipeline for updates to GRCh38 (in progress)

3.6 Comparison to known problematic regions (done)

3.7 F1000 paper (in progress)

Page 8: SVs hackathon group report

● 5kb region

● Normalized breakpoints

● 1-mappability

● GC content

● Combined score is the average of tracks.

Example on hg19:

Chr3:

Chr5:NCBI region list

Encode blacklist