Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
-
Upload
andrea-telatin -
Category
Science
-
view
200 -
download
6
description
Transcript of Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Target EnrichmentUnderstanding the output
Andrea Telatin BMR Genomics
Today menu:!
!
• Disease research applications for TE panels
• bioinformatic analysis of the data…
• …and how to handle the output
using a Cardiomiopathy panel as a test case
Why?Technology Overview
3.3 Gb 50 Mb 0.5 Mb
Gilissen, Genome Biol 2011
Gilissen, Genome Biol 2011
Exome Seq Custom Panels
With Custom TE!!
Finding relevant variants !
Spending less !
Focus on Your Favourite Genes
With Custom TE!!
Finding relevant variants !
Spending less !
Focus on Your Favourite Genes
Case study
Case study
Antonio Puerta
Cardiomyopathies• Targets most common causative SNPs for
• ARVC (Arrhythmogenic right ventricular cardiomyopathy)
• Brugada Syndrome
• Long QT
• Hypertrophic cardiomyopathy
The Panel
Cardiomyopathies• We designed a panel for CMPD
• Platform of choice: Agilent HaloPlex
• Sequencer: Illumina MiSeq (PE 2x150)
• 56 targeted genes (165 regions)
• 500 kb target size
The Panel
Output at a glance• Sequenced 44 samples so far
• Average cov: 232X (±36X)
• Reads on target: 99.6%
• Target > 5X: 95.6%
How?Bioinformatic Analysis
• Target enrichment + Library Preparation
• Sequencing
• Alignment against reference
• Local realignment
• Variant calling
• Variant annotation
• Data mining
!
!
Format: SAM
!
Format: VCF
!
Sequence alignment
This is a hard example. !That is another easy example.
This is a --hard---- example. || ||||| | | ||||||||| That is another easy example.
This is a-- h-ard---- example. || ||||| | | ||||||||| That is anothe-r easy example.
This is a hard example.------ || ||||| | | That is another easy example.
Gap C
ost
To discover more…• The standard algorithms for sequence alignment
are Needleman-Wunsch and Smith-Waterman
• For large sequences the standard is BLAST
• For short reads one of the most popular choices is BWA (uses BWT)
• Interesting CUDA enabled implementations2003 Thesis
Sequence alignment
Short
Chromosomes (reference)
Short reads
Chromosomes (reference)
Short reads
Challenges
• Million reads to be aligned
• Short reads are less likely to be “unique”
The SAM/BAM formats
• SAM (Sequence Alignment Format) is a plain text format born and designed for short reads alignments
• It’s complex for humans, because designed for machines
• It has been a major improvement in NGS analyses
SAM
DAT
A
Sequence realignment
• Sequence alignment is (mostly) done one sequence at a time
• At the end we can “rethink” the choices done while aligning, looking at the whole picture
Variants?
Variants?
Errors!
• Once that the alignment is “cleaned”, variant calling becomes a little bit easier.
• Several aspects are involved, much more than the mere “counting differences”
• These aspects are complex, interesting… …but we are not talking about them today!
The VCF format
Annotation
Chromosomes (reference)
Short reads
Genes/Transcripts
G>G Y>. C>WAminoacid changes
Functional annotationDisease database
Effect predictorsLiterature links
VEP: Variant Effect Predictor• ! genes and transcripts affected by the variants
• ! location of the variants (e.g. upstream, in coding sequence, in non-coding RNA, in regulatory regions)
• ! consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift)
• ! known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project
• ! SIFT and PolyPhen scores for changes to protein sequence
ANNOVARANNOVAR is an efficient tool to functionally annotate genetic variants.
• Gene-based annotation: identify whether SNPs or CNVs cause protein coding changes and the amino acids that are affected.
• Region-based annotations: identify variants in specific genomic regions, for example, conserved regions among 44 species, predicted transcription factor binding sites,…
• Filter-based annotation: identify variants that are reported in dbSNP, or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project, or identify subset of non-synonymous SNPs with SIFT score>0.05, …
Can open: ALIGNMENTS (BAM) ANNOTATIONS (BED) VARIANTS (VCF)
Any questions?
Summarizing!
• Target enrichment: many individuals sequenced on genes of interest
• SAM/BAM formats to store alignments
• The IGV program to visualise tracks (including alignments)
• The VCF format to store genomic variations
• Annotation programs add things to a flat file
Acknowledgments: BMR Genomics
• CEO: Barbara Simionati
• NGS Team Leader: Giorgio Malacrida
• Target Enrichment specialist: Ilena Li Mura
• Variant annotation specialist: Ivano Zara
…and everybody else there, making the whole team special.