Bioinformatic approaches to 454 data analysis for...
Transcript of Bioinformatic approaches to 454 data analysis for...
Bioinformatic approaches to 454 data analysis for HIV
10th European Workshop Meeting on HIV & Hepatitis
Marc Noguera i Julian Barcelona, 29/03/2012
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Summary
1 – Technology Introduction
2 – Specs & Performance
3 – Applications
4 – Limits and sources of error
5 – Bioinformatic Tools
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Technology Introduction (I)
- Chromatogram signal averages over whole DNA sequences population -This translates into only being to detect viral variants which are present in 20% or more of the population - Characterization of minority variants requires several (>100 ) cloning and sequencing steps
-Shows very high sensitivity -Limited to a pre-defined (small) set of mutations of interest
Sanger Based, globally used for resistance testing:
Allele-Specific PCR
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Amplicon 1
Protease (PI) RT (NRTI+NNRTI)
Amplicon 2 Amplicon 3
Amplicon 4
Amplicon 5
Single Clone
Emulsion PCR
Technology Introduction (II)
Single template sequencing - Flowgram
Picotiter Plate Loading
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Specs & Performance
FLX+ FLX GS/Junior
Median Read Length
800 450 450
Read Number 750.000 750.000 80.000
Samples/Run >40 25-40 5-8
RunTime(Lab) 7 days 7 days 4 days
Analysis ? ? ?
Two Different Platforms for large/small scale projects.
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Applications - HIV
High Sensitivity Genotyping 1000 – 20.000 seq / Sample
Population characterization
Low Frequency Resistant Mutants
Low-Level X4 Tropic Viruses
Resistance&Tropism Dynamics
Population Reconstruction
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Limits and Sources of error
Sampling
PCR Error
Recombination
Sequencing
Carry Forward – Incomplete Extension (CaFIE) Error
Homopolymer Error
Paredes et al. J. Virol. Method. 2007, 146, 136 Zagordi et al. Nucl.Ac.Res. 2010, 38, 7400. Margulies et al. Nature, 2005, 437, 376. Lahrs et al. BioTechniques, 2009, 47, 857.
A viral load of approx 30.000 is needed for reliably detecting 0.1% variant Primer design may introduce amplification biases
Depends on used polymerases, PCR definition and extension times May produce chimeric sequences Broad Range of recombination rates.
Depends on RT-PCR polymerases error rates. May introduce false mutations, virtually indistinguishable from true ones.
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Gilles et al. BMC Genomics. 2011. doi:10.1186/1471-2164-12-245
Mismatches
Insertions
Deletions
Error is not random/shows patterns
InDel errors are the most common error
For a read length of > 400 bp, Majority of reads contain errors
Error patterns are critical on amplicon Sequencing designs.
Error Patterns
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Applications
Raw Reads
AmpliconNoise1
Shorah3
V-Phaser/RC4542
1Quince, C., BMC Bioinformatics, 2011, 12, 38 3Zagordi, O., BMC Bioinformatics, 2011, 12, 119 2Macalalad, AR., PLoS Comp Biol, in press
NO ?
Bioinformatician Around?
YES
ViSPA QuRE
Haplotype Reconstruction
Error Correction
Sequence Aligners
Mosaik BWA MAFFT MUSCLE
In-House Code
Pipeline
Variant Detection
AVA®
Segminator
DataMonkey
Geno2Pheno[454] DeepChek®
PyroDyn/Mut®
Resistance Interpretation Tropism Prediction
Diversity Analysis
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Resistance – UI – AVA (Roche – 454)
Amplicon Variant Analyzer (AVA)
Alignment & Flowgram Browsing Variant Explorer
Features: Multiple samples Search for pre-defined mutation set New variant discovery Export tools to standard formats and tables
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Resistance – UI – Segminator
Segminator II
REFERENCE COVERAGE NON-CONSENSUS ENTROPY INSERTIONS DELETIONS
Sequencing Browser
Quick Trees For Selected Regions
Positional Browser
Archer et al. http://www.bioinf.manchester.ac.uk/segminator/
Features: •Little pre-processing •Filters quality reads •Own Reference •Diversity estimation •Phylogenetic tools
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
HIV Tropism – UI – Geno2Pheno[454]
•Web Interface to tropism predictor for 454 Sequences •Needs some pre-processing on a local computer •Filters applied to sequences •Each sequence tropism analyzed •Several Samples at a time •Full Report for each sample, web-shareable.
FPR Cons-Dist
http://g2p-454.bioinf.mpi-inf.mpg.de/index.php Daumer et al.BMC Biomed. Inf. & Decision Making.2011,11,30
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Bioinformatics - YES
• Generally needs of programming an in-house pipeline. • Detection limit stablished on internal controls. • Sensitivity limit generally stablished around 0.5-1.0%
Filtering / Correction
Raw Reads
Quality Scores
Ambiguous Bases
Insertion Deletions
Quality Improved
Length
Reference Alignment
Variant Calling
Variant Filtering x N
Multiple Alignment
Phylogeny Inference
Population Reconstruction
Interpretation
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Example 1: Resistance Dynamics
Figure 3. HIV-1 variant dynamics before, during and after treatment. Most common variants in each time point are illustrated as circles (if recurring) or as cubes (if not recurring). The genetic distance of the variants in nucleotide changes/site (from the most frequentvariant at the first time-point) is plotted over time. The frequency of the variants is proportional to the area of the circles and cubes.
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Example 2: V3 Population Reconstruction
Figure 3. Minimum spanning tree (MST) of V3 sequences from subject DS1. V3 sequences generated by deep sequencing were used to construct MSTs. Identical nucleotide sequences are grouped in one node, and the circle size is proportional to the abundance of that particular V3 sequence. The length of the connecting branches corresponds to the number of nucleotide differences between the two connected nodes. Timepoints are color-coded, using bright colors for PBMC samples and corresponding soft colors for serum samples.
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Error Corr.
Reference Mapping
Phylogeny Variant Calling
Haplotype Reconstruction
Resistance Analysis
UI Ref. Free Software
AVA 1
AmpliconNoise 2
Shorah 3
Segminator 4
DataMonkey 5
QuRe 6
ViSPA 7
V-Phaser 8
PyroDyn
DeepChek
Geno2Pheno[454] 9
454 Applications - Summary
1Margulies et al. Nature, 2005, 437, 376.
2Quince, C., BMC Bioinformatics, 2011, 12, 38 3Zagordi, O., BMC Bioinformatics, 2011, 12, 119 4http://www.bioinf.manchester.ac.uk/segminator/
5Delport et al. Bioinformatics, 2010, PMID: 20671151 6Prosperi et al. Bioinformatics, 2012, 28, 132 7Astrovskaya et al. BMC Bioinformatics, 2011; 12(Suppl 6): S1. 8Macalalad, AR., PLoS Comp Biol, in press
9Daumer et al.BMC Biomed. Inf. & Decision Making.2011,11,30
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona
Acknowledgements
Funding Molecular Epidemiology Group
Rocio Bellido Maria Casadellà
Elisabeth Gómez Roger Paredes
Susana Pérez Christian Pou
Cristina Rodriguez Teresa Sequeros
Put a bioinformatician in your life
Presented at the 10th EU Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona