Evaluation of CNV detection from targeted next-generation panel sequencing … · 2017-11-19 ·...
Transcript of Evaluation of CNV detection from targeted next-generation panel sequencing … · 2017-11-19 ·...
Evaluation of CNV detection
from targeted next-generation panel sequencing data
in routine diagnostics
Anna Benet-Pagès
Prague 07.11.2017
Phenotype OMIM Locus CNV
Mendelian (autosomal dominant)
7q11.23 duplication syndrome 609757 7q11.23 dup
Adult-onset leukodystrophy 169500 LMNB1 dup
CHARGE syndrome 214800 7q21.11/SEMA3E/8q12.2/CHD7 del/dup
CMT1A 118220 17p12/PMP22 dup
DGS/VCFS 188400/192430 22q11.2/TBX1 del
HNPP 162500 17p12/PMP22 del
Mental retardation 601545 17p13.3/LIS1 dup
Microduplication 22q11.2 608363 22q11.2 dup
Miller-Dieker lissencephaly syndrome 247200 17p13.3/LIS1 del
Neurofibromatosis, type 1 162200 17q11.2/NF1 del/dup
Potocki-Lupski syndrome 610883 17p11.2 dup
Smith-Magenis syndrome 182290 17p11.2/RAI1 del
Sotos Syndrome 117550 5q35.3/NSD1 del/dup
Spinocerebellar ataxia type 20 608687 11q12 dup
Tuberous sclerosis-1 191100 9q34.13/TSC1 del
Williams-Beuren syndrome 194050 7q11.23 del
Mendelian (autosomal recessive)
alpha-thalassemia 141750 16p13.3/HBA del
beta-thalassemia 141900 11p15/beta-globin del
Familial juvenile nephronophthisis 256100 2q13/NPHP1 del
Gaucher disease 230800 1q21/GBA del
juvenile Batten disease 204200 16p12.1/CLN3 del
Pituitary dwarfism 262400 17q24/GH1 del
Spinal muscular atrophy 253300 5q13/SMN1 del
Mendelian (X- linked)
Duchenne Muscular Dystrophy 310200 Xp21.2-p21.1/DMD del/dup
Hemophilia A 306700 F8 inv/del
Hunter syndrome 309900 IDS del/inv
Ichthyosis 308100 STS del
Mental retardation 300706 HUWE1 dup
Pelizaeus-Merzbacher disease 312080 PLP1 del/dup/tri
Progressive neurological symptoms (MR+SZ) 300260 MECP2 dup
Red-green color blindness 303800 opsin genes del
Complex traits
Alzheimer disease 104300 APP dup
Autism 612200 3q24 inherited homozygous del
Crohn disease 266600 HBD-2 copy number loss
HIV susceptibility 609423 CCL3L1 copy number loss
Mental retardation 612001 15q13.3 del
Pancreatitis 167800 PRSS1 tri
Parkinson disease 168600 SNCA dup/tri
Psoriasis 177900 DEFB copy number gain
Schizophrenia 612474 1q21.1 del
Systemic lupus erythematosus 152700 FCGR3B copy number loss
Structural variants in human disease
Calvin Bridges, 1936
narrow slit eye
Methods for structural variant detection
High resolution technologies reveal small-size CNVs
Zhang et al. 2009
Watson genome (2008)
Kidd et al. (2008)
Venter genome (2007)
Korbel et al. (2007)
Redon et al. (2006)
Size distribution of copy number variations (CNVs) larger than 100 bp.
Smaller structural variants are the most frequent.
The era of whole genome sequencing
The era of whole genome sequencing
On the meanwhile…
targetregioncapture
wholegenome
CNV detection from targeted-capture data
Issues:
• CNV detection from exon capture approaches depends solely on read depth data
• Exon capture introduces a systematic noise in read depth data
• Many different kits with different enrichment efficiency
• Coverage bias between sequencing runs and within samples of the same run
• Single exon events are extremely difficult to detect
• Control individuals are difficult to obtain (reference set / validation)
• Validation is expensive
decrease sensitivity and accuracy required for routine diagnostics
AGE, BicSeq, BreakDancer, Breakpointer, Breakseq, Canoes, Clamms, Clever, ClipCrop,
Cn.MOPS, CNAnorm, CNAseg, CND, CNV_TV, Cnvator, CNVer, CNVer, HugeSEQ,
hydra, inGAP_sv, JointSLM, Matchclip, modil, mogul, mrcanavar, Patchwork, pemer
, ReadDepth, rSW_seq, segseq, seqcbs, CNVer, cnvHiTSeq, cnvrd, CNV-seq,
conserting, CONTROL_FREEC, cops, copySeq, crest, ERDS, codex
EWT_RDXplorer, GasvPRO, GENSENG
CNV detection methods
Use a combination of several detection tools
Noll et al., Npjgenmed 2016
„meta-CNV-caller“
• applicable to capture data
• calling of rare CNVs
• easy to integrate (take bam files as input)
• easy handling (installation / running time)
• multi-sample usage (possibility to normalize against reference set)
• Tools should use different statistic models
CNV detection methods general considerations
Which tool should I choose?
• ExomeDepth
extremely sensitive and robust against samples that do not correlate with thereference
• Canoes
has a high sensitivity for small deletions, high performance in low coverage regionsand with few reference samples
• Clamms
corrects for GC content and mappability, divides large exons into smaller regions andcalls also common CNVs
• Codex
corrects for GC content and mappability, calls also common CNVs, uses no HMM forsegmentation (all other tools use HMMs)
• Inhouse method
is well adapted on inhouse data, screens for heterozygosity, corrects for GC content, exon score depends on previous analyses
Meta-Tool CNV Detection Pipeline
Profit from the advantages of single tools
Exome
Depth
Clamms Canoes Codex In-house Combination
Precision 45.63% 68.57% 96.77% 64.75% 40.82% 95.16%
Sensitivity 90.38% 46.15% 57.69% 63.46% 76.92% 80.82%
utilization of five independent detection tools increases sensitivity under the criteria “at least 2 tools call the same CNV”
SensitivityPrecision
Reference Sets and data normalization
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9
Co
vera
ge
SMARCB1 - Exons
Illumina Cancer Kit
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9
Co
vera
ge
SMARCB1 - Exons
Agilent Custom Design
different reference sets for different kits / enrichment methods
normalization against samples from the same sequencing run to improve robustness against workflow conditions
CNV Pipeline Structure
Meta-CNV callingExomeDepth
CanoesCodex
ClammsIn-house
CALLS
Final CNV Calls
BAMFiles
Gene Panels:• TruSight Cancer
(94 genes)• Agilent Custom
(1564 genes)
MiSeq /NextSeq
Mapping to hg19
BWA/GATK
Segmentation
Statistical Model
Normalization
Filter/Tag
Criteria
Overlapp with pseudogene region
Location in a biased bait-enrichment region
Coverage <30X
CNV frequency within the library preparation/sequencing pool
CNV number / sample
1 exon events: duplications (filtered) / 1 exon deletions (tagged)
0% 5% 10% 15% 20% 25% 30% 35%
CNV Calling Performance
2 3 4 5
2 3 4 5
Quality thresholds for single exons
definition of special quality thresholds for single exon events to minimize false negatives
Use different weighting for single exon calls depending on the detection tool: high scores for Codex and Canoes
average scores for Clamms and In-house methodlow scores for ExomeDepth
Use a negative scoring for single exon calls if one tool calls multiple hits in onegene
Define vs. non-reliable regions
identification of reliable regions by assessment of capture efficiency using a reference set of CNV negative patients to minimize false positives
can not be analyzed
# dup calls
# del calls
Read depth/exon reference set
Read depth/exon sample
Call / exon
CNV calling in Pseudogenes
pseudogene
# dup calls
# del calls
Read depth/exon reference set
Read depth/exon sample
Call / exon
CNV calling in Pseudogenes
pseudogene
exon 15 14 13 12 11
can not be analyzed
CNV calling artifacts?
# dup calls
# del calls
Read depth/exon reference set
Read depth/exon sample
Call / exon
CNV calling artifacts?
CNV calling artifacts?
CNV Pipeline Evaluation
Sensitivity: 96.20%Specificity: 99.66%Precision: 92.69%
• >2000 MLPAs were performed in 69 genes
(ABCC6, APC, APP, ARHGEF15, ARID1B, ATM, BRCA1, BRCA2, CDH1, CHAT, CHCHD10, CHEK2, CHRNA4, COL3A1,
COX10, CTC1 , DKC1, DMD, DSG2, EPCAM, ERCC6, FBN1, FGFR3, FOXC1, FXN, GJB1, HGSNAT, IKBKG, KCNH2, KCNQ2, MAPK3, MECP2, MET, MLH1, MPV17, MSH2, MSH6, MUTYH, NAA10 , NF1, NRXN1, NSD1, PAFAH1B1, PCDH19, PLOD1, PMP22, PMS2, POMK , PRRT2 , RAB39B, RYR1, SACS, SCN1A, SETX, SGCG, SMAD4, SMARCB1,
SPAST, SPG11, SRPX2, STK11, TENM3, TGFBR2 E4, TNXB, TSPAN7, TTR, VPS13B, WFS1, WWOX)
CNV Pipeline Evaluation
Gene # exons CNV type Nr. Tools called
PMS2 (2x) 3 (pseudogene) del 2
OPA1 1 del 2
SGCG 1 dup 2
SMARCB1 1 del 2
ARID1A 1 dup 4
False positives:
Gene # exons CNV type
OPA1 1 del
FOXC1 dup 6p25.3 [485 Kb]
dup
False negatives:
So… where are the good news?
CNV calls
exon #
read depth sample
read depth reference
Meta-CNV-caller: multi calls for one event
2
E1 – E12
E2 – E15
Meta-CNV-caller: multi calls for one event
Meta-CNV-caller: multi calls for one event
Gene Panel Dx for rare mendelian disorders and hereditary cancer
CNVs can be reliable analyze in 1/3 of the genes of our capture kit
CNV clarified the underlying phenotype in 8 % of the cases
dup
del
CNV analysis on 1600 individuals within the routine Dx
502 CNV calls40 CNVs clarified
phenotype
85 CNVs subjected to
specialist examination
interpretationfilter
33
7
CNV analysis on 1600 individuals within the routine Dx
Increase of the diagnostic yield in 3%
Summary
CNV detection of capture panel is possible with „Meta-CNV-Callers“
Rigurous filtering is required
Parallel analysis of SNV and CNV increases the diagnostic yield in routine Dx
Challenging interpretation due to lack of proper Databases (only array data)
Anke NissenJanine GrafFlorentine ScharfTobias WolfromAndreas LanerMelanie Locher