Reconstruction of Haplotype Spectra from NGS Data

19
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut

description

Reconstruction of Haplotype Spectra from NGS Data. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut. Haplotype Spectra Reconstruction. Given NGS reads, reconstruct: Full length sequences - PowerPoint PPT Presentation

Transcript of Reconstruction of Haplotype Spectra from NGS Data

Page 1: Reconstruction of Haplotype Spectra from  NGS  Data

Reconstruction of Haplotype Spectra from NGS Data

Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering

University of Connecticut

Page 2: Reconstruction of Haplotype Spectra from  NGS  Data

Haplotype Spectra Reconstruction

• Given NGS reads, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

Page 3: Reconstruction of Haplotype Spectra from  NGS  Data

Single Individual Haplotyping• Somatic cells are diploid, containing two nearly

identical copies of each autosomal chromosome– Heterozygous loci found by mapping reads to reference

genome– Long haplotype fragments can be generated by

sequencing fosmid pools [Duitama et al. 2012]

Page 4: Reconstruction of Haplotype Spectra from  NGS  Data

RefHap Algorithm [Duitama et al. 12]

• Reduce the problem to Max-Cut• Solve Max-Cut• Build haplotypes according with the cut

Locus 1 2 3 4 5f1 * 0 1 1 0

f2 1 1 0 * 1

f3 1 * * 0 *

f4 * 0 0 * 1

3f1

1

1 -1

-1f4

f2

f3

h1 00110h2 11001

Chr. 22, 32k SNPs, 14k fragments

Page 5: Reconstruction of Haplotype Spectra from  NGS  Data

Haplotype Spectra Reconstruction

• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

Page 6: Reconstruction of Haplotype Spectra from  NGS  Data

Transcriptome Reconstruction Challenge: Alternative Splicing

[Griffith and Marra 07]

Page 7: Reconstruction of Haplotype Spectra from  NGS  Data

1 742 3 65t1 :

1 743 65t2 :

1 742 3 5t3 :

t4 : 1 743 5

1 742 3 65

Page 8: Reconstruction of Haplotype Spectra from  NGS  Data

• Map the RNA-Seq reads to genome

• Construct Splice Graph - G(V,E)– V : exons– E: splicing events

• Generate candidate transcripts– Depth-first-search (DFS)

• Filter candidate transcripts– Fragment length distribution (FLD)– Integer programming

Genome

TRIPTransciptome Reconstruction using Integer Programming

Page 9: Reconstruction of Haplotype Spectra from  NGS  Data

How to filter?

• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs

1 3

1 2 3

500

300

200 200 200

200 200

Series1

Mean : 500; Std. dev. 50

Series1

Mean : 500; Std. dev. 50

t3t2 t1

Page 10: Reconstruction of Haplotype Spectra from  NGS  Data

Allele Specific Expression

Page 11: Reconstruction of Haplotype Spectra from  NGS  Data

Haplotype Spectra Reconstruction

• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

Page 12: Reconstruction of Haplotype Spectra from  NGS  Data

RNA Virus ReplicationHigh mutation rate (~10-4)

Lauring & Andino, PLoS Pathogens 2011

Page 13: Reconstruction of Haplotype Spectra from  NGS  Data

Shotgun reads starting

positions distributed

~uniformly

Amplicon reads

have predefined

start/end positions

covering fixed

overlapping windows

Shotgun vs. Amplicon Reads

Page 14: Reconstruction of Haplotype Spectra from  NGS  Data

Reconstruction from Shotgun Reads: ViSpA

Read Error Correction

Read Alignment

Preprocessing of Aligned Reads

Read Graph ConstructionContig AssemblyFrequency

Estimation

Shotgun reads

Quasispecies sequences w/ frequencies

Page 15: Reconstruction of Haplotype Spectra from  NGS  Data

Reconstruction from Amplicon Reads: VirA

Reference in FASTAformat

Error-correctedSAM/BAMRead data

Estimate Amplicons

Max-Bandwidth Paths

Viral population variants with frequencies

Amplicon Read Graph

Frequency Estimation

Page 16: Reconstruction of Haplotype Spectra from  NGS  Data

• K amplicons represented by K-layer read graph

• Vertices distinct reads⇔• Edges reads with consistent overlap⇔• Vertices have count function c(v)

Amplicon Read Graph

Page 17: Reconstruction of Haplotype Spectra from  NGS  Data

Read Graph Transformation• Heuristic to reduce edges in dense graphs

• Replace bipartite cliques with star subgraphs

Page 18: Reconstruction of Haplotype Spectra from  NGS  Data

Challenges

• Scalability• Exploit inherent sparsity of biological instances

• E.g., exact scaffolding algorithm using non-serial

dynamic programming based on SPQR trees

• Flexibility• Long (noisy) reads + short

• Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq

• Quantifying reconstruction uncertainty• Compute intensive, e.g., bootstrapping

+

+

+

--

+

-

-

Page 19: Reconstruction of Haplotype Spectra from  NGS  Data

Acknowledgements

Jorge DuitamaSahar Al SeesiMazhar KahnRachel O’Neill

Alexander ArtyomenkoAdrian CaciulaNicholas MancusoSerghei MangulBassam TorkAlex ZelikovskyIrina AstrovskayaPavel Skums