Detecting and Genotyping CNV
-
Upload
marek-brandys -
Category
Documents
-
view
118 -
download
2
Transcript of Detecting and Genotyping CNV
Population Approaches to Detecting and Genotyping Copy Number VariationLachlan Coin July 2010
Outline Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data
cnvHap approach to CNV discovery and genotyping
Coin et al, 2010, Nature Methods 7, 541 - 546 (2010)
Example of trained model
cnvHap models haploid CN transitions Specify an per-base global transition rate matrix
copy number to 0 1 2 3 4
copy number from
0 1 2 3 4
q00 q10 .
Rate matrix multiplied by position specific scalar rate Values trained using EM, following the approach of Klosterman et al, used in Xrate for finding substitution rates
cnvHap joint model of CNV + SNP haplotypes
Cluster positions modelled using a linear modelf0 (g) = 1 f1 ( g ) = log(CN( g )/2) rm ( g ) 2 f 2 ( g ) = (log(CN( g )/2))2 rm ( g ) (g) = * f 3 ( g ) = bfrac( g ) bm 2 (g) f 4 ( g ) = bfrac( g ) * (1 bfrac( g )) bm f ( g ) = bfrac( g ) * (bfrac( g ) 0.5)* (bfrac( g ) 1) 5
Model fitted using Ridge regression carried at each iteration of E-M algorithm
Using Illumina SNP arrays
Combined Illumina and Agilent arraysIllumina Agilent Illumina Agilent Illumina Agilent
Some CNVs exhibit shared structure
Improved CNV genotyping accuracyCumulative Frequency of Squared Pearson Correlation
A deletion at 16p11.2 in a patient with extreme obesity+1 0
MLPA probes Segmental duplication
p1 3. 12
q1 2. 2
p1 2. 3
p1 2. 1
p1 3. 2
p1 1. 2
q2 2. 2
q2 3. 1
q2 3. 3
chromosome 16
estimated by aCGH to be 546kb-700kb flanked by segmental duplication (>99% sequence identity) probably arises by NAHR, implying deletion is 739kb BMI = 29.2 kg.m-2 at age 7 learning difficulties, delayed speechRG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727
q2 4. 2
q2 1
log 2ratio
-1 -2 -3
28.9 Mb
29.2 Mb
29.5 Mb
29.8 Mb
30.1 Mb
30.4 Mb
30.7 Mb
16p11.2 deletions in obesity and population cohortsCohort French child obesity case:control British extreme early-onset obesity (SCOOP) French adult obesity case:control French bariatric surgery patients Swedish discordant siblings Population cohorts (NFBC1966, CoLaus, EGPUT) Obesity: Morbid obesity: P = 5.8x10-7 P = 6.4x10-8 Obese 4/643 3/931 4/705 2/141 2/159 3/1592 Lean/ Normal Weight 0/530 0/669 0/140 1/6235
OR = 29.8 [3.9225] OR = 43.0 [5.6329]
Coverage affected by GC content
Regression model fit to correct for GC bias
Loess curves fit to remove residual spatial variation of coverage
Detecting CNVS with NGS dataDepth/haploid coverage
B-allele frequency
NGS versus CGH dataNGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb
NGS vs CGH data
Haplotype structure of deletion
NGS amplificationDepth/coverage
With consistent break-points in population
Imputation error rate
Switch error rate
Polyploid phasing and imputation
Conclusions Population-haplotype model enables joint CNV discovery and genotyping using array data Preliminary results indicate this will also help using NGS data Combining information from multiple platforms improves sensitivity Imputation still works for ploidy > 2, phasing becomes more difficult
AcknowledgementsEvangelos Bellos Shu-Yi Su Robin Walters David Balding (UCL) Rob Sladek (McGill)
Julian Asher Alex Blakemore Adam de Smith Phillipe Froguel Julia El-Sayed Moustafa