The “Hidden Heritability” Problem

23
The “Hidden Heritability” The “Hidden Heritability” Problem Problem Using Tumor Studies as a Model for Using Tumor Studies as a Model for Future GWAS success Future GWAS success Eric D. Tycksen Eric D. Tycksen Genome Technology Access Center Genome Technology Access Center Genome Technology Access Center Genome Technology Access Center Washington University in St. Louis Washington University in St. Louis

Transcript of The “Hidden Heritability” Problem

Page 1: The “Hidden Heritability” Problem

The “Hidden Heritability” The “Hidden Heritability” ProblemProblem

Using Tumor Studies as a Model for Using Tumor Studies as a Model for Future GWAS successFuture GWAS success

Eric D. TycksenEric D. TycksenGenome Technology Access CenterGenome Technology Access CenterGenome Technology Access CenterGenome Technology Access CenterWashington University in St. LouisWashington University in St. Louis

Page 2: The “Hidden Heritability” Problem

“Hidden Heritability”“Hidden Heritability”Hidden HeritabilityHidden Heritability

Refers to the missing explanations forRefers to the missing explanations forRefers to the missing explanations for Refers to the missing explanations for heritability of heritable complex heritability of heritable complex phenotypes that cannot be accounted for phenotypes that cannot be accounted for p ypp ypby SNP GWAS alone. by SNP GWAS alone. Manolio et al., Finding the missing heritability of complex diseases, Manolio et al., Finding the missing heritability of complex diseases, Nature,Nature, 461461: :

747747--753 2009753 2009747747--753, 2009753, 2009

Many believe that copy number variation Many believe that copy number variation may account for some of this “hiddenmay account for some of this “hiddenmay account for some of this hidden may account for some of this hidden heritability.”heritability.”Epigenitics may explain the rest.Epigenitics may explain the rest.Epigenitics may explain the rest.Epigenitics may explain the rest.

Page 3: The “Hidden Heritability” Problem

GoalsGoalsGoalsGoals

Demonstrate how one can take the worstDemonstrate how one can take the worstDemonstrate how one can take the worst Demonstrate how one can take the worst case scenario for copy number analysis to case scenario for copy number analysis to uncover some “hidden heritability” throughuncover some “hidden heritability” throughuncover some hidden heritability through uncover some hidden heritability through the thorough examination of paired the thorough examination of paired primary and metastatic tumorsprimary and metastatic tumorsprimary and metastatic tumors.primary and metastatic tumors.Once you can decipher tumor variation, Once you can decipher tumor variation, normal populations are easy innormal populations are easy innormal populations are easy in normal populations are easy in comparison.comparison.

Page 4: The “Hidden Heritability” Problem

SNP calling in GenomestudioSNP calling in GenomestudioSNP calling in GenomestudioSNP calling in GenomestudioAll X,Y intensities are normalized and All X,Y intensities are normalized and polar transformed into R and Thetapolar transformed into R and Thetapolar transformed into R and Theta polar transformed into R and Theta coordinates.coordinates. R = XR = Xnormalized normalized + Y+ Ynormalizednormalized

Theta = (2/pi)arctan(YTheta = (2/pi)arctan(Ynormalizednormalized/ X/ Xnormalizednormalized))Log R ratios and BLog R ratios and B allele Frequenciesallele FrequenciesLog R ratios and BLog R ratios and B--allele Frequencies allele Frequencies are derived from R and Theta and are are derived from R and Theta and are the typical raw data used for copy the typical raw data used for copy number analysis for Illumina Infinium number analysis for Illumina Infinium assays.assays.yy Log R Ratio (LRR) = Log R Ratio (LRR) =

log2(Rlog2(Rsubjectsubject/R/Rexpectedexpected)) BB--Allele Frequency (BAF) = Allele Frequency (BAF) =

Standardized ThetaStandardized Theta

DA Peiffer et al., HighDA Peiffer et al., High--resolution genomic profiling of resolution genomic profiling of chromosomal aberrations using Infinium wholechromosomal aberrations using Infinium whole--genome genotyping, genome genotyping, Genome ResearchGenome Research, 16: , 16: 11361136--1148, 20061148, 2006

Page 5: The “Hidden Heritability” Problem

Getting StartedGetting StartedGetting StartedGetting StartedFive samples genotyped Five samples genotyped

Ill i ’ H 1MIll i ’ H 1Mon Illumina’s Human 1Mon Illumina’s Human 1M--Omni Quad assay as part Omni Quad assay as part of an on going of an on going collaboration with thecollaboration with thecollaboration with the collaboration with the Genome Center.Genome Center. 1 Normal sample1 Normal sample 1 Primary tumor1 Primary tumor 1 Primary tumor1 Primary tumor 3 Metastatic tumors from 3 Metastatic tumors from

the spine, liver, and the spine, liver, and adrenal glandadrenal gland

All d t t d fAll d t t d fAll data exported from All data exported from Genomestudio into Genomestudio into Partek Genomics Suite v. Partek Genomics Suite v. 6 56 56.56.5

Page 6: The “Hidden Heritability” Problem

Explore your DataExplore your DataExplore your DataExplore your Data

Always begin by creating PCA plots of the logAlways begin by creating PCA plots of the logAlways begin by creating PCA plots of the log Always begin by creating PCA plots of the log ratios and the Bratios and the B--allele frequencies.allele frequencies.Samples should form easy to distinguish clustersSamples should form easy to distinguish clustersSamples should form easy to distinguish clusters Samples should form easy to distinguish clusters based on a known categorical factor.based on a known categorical factor.

Page 7: The “Hidden Heritability” Problem

Partek Genomic’s SuitePartek Genomic’s SuitePartek Genomic s SuitePartek Genomic s SuiteUse the normalized intensities exported from Use the normalized intensities exported from ppGenomestudio to generate shifted paired Log 2 Genomestudio to generate shifted paired Log 2 Intensity Ratios or shifted paired Log R Ratios.Intensity Ratios or shifted paired Log R Ratios. This makes the visualization and detection of copyThis makes the visualization and detection of copy This makes the visualization and detection of copy This makes the visualization and detection of copy

number aberrations more intuitive.number aberrations more intuitive. This assumes the genome of interest is largely This assumes the genome of interest is largely

diploid.diploid.diploid.diploid.By default, all Log Ratio data should be GC By default, all Log Ratio data should be GC content corrected (GC waveform correction).content corrected (GC waveform correction).C f ll i th l ti hi b t thC f ll i th l ti hi b t thCarefully examine the relationship between the Carefully examine the relationship between the BB--allele Frequencies and the Log Ratio data for allele Frequencies and the Log Ratio data for every sample or sample group.every sample or sample group.

Page 8: The “Hidden Heritability” Problem

Basic QC of Normal SamplesBasic QC of Normal SamplesBasic QC of Normal SamplesBasic QC of Normal SamplesFor normal diploid samples, For normal diploid samples, p p ,p p ,copy number analysis is copy number analysis is straightforward.straightforward.

Segmentation or HMMSegmentation or HMM Segmentation or HMMSegmentation or HMMPartek GSPartek GSNexusNexusQ tiSNPQ tiSNPQuantiSNPQuantiSNPPennCNVPennCNVCBS in R or JMP GenomicsCBS in R or JMP GenomicsFAÇADEFAÇADEFAÇADEFAÇADE

For copy number genotypes:For copy number genotypes:GT = (A allele)(CN)(1GT = (A allele)(CN)(1--BAF) + (B allele)(CN)(BAF)BAF) + (B allele)(CN)(BAF)

Page 9: The “Hidden Heritability” Problem

Basic QC of Tumor SamplesBasic QC of Tumor SamplesBasic QC of Tumor SamplesBasic QC of Tumor SamplesBB--allele Frequency exhibits allele Frequency exhibits signs of unexplained variationsigns of unexplained variationsigns of unexplained variation signs of unexplained variation due to possible normal due to possible normal contamination, intracontamination, intra--tumor tumor heterogeneity, and polyploidy heterogeneity, and polyploidy not readily apparent in Lognot readily apparent in Lognot readily apparent in Log not readily apparent in Log Ratios.Ratios.All samples are from the same All samples are from the same individual.individual.Notice that the allele calls are Notice that the allele calls are not the same.not the same.The heterozygous sample is The heterozygous sample is the normal sample Thethe normal sample Thethe normal sample. The the normal sample. The tumors are homozygote tumors are homozygote BB….or are they? We need BB….or are they? We need more information.more information.

Page 10: The “Hidden Heritability” Problem

Partek’s AllelePartek’s Allele--Specific Copy Specific Copy N bN bNumberNumber

Generate paired AlleleGenerate paired Allele--Specific CopySpecific CopyGenerate paired AlleleGenerate paired Allele Specific Copy Specific Copy Number from the SNP data and the Number from the SNP data and the normalized allele intensities to determine normalized allele intensities to determine the estimated number of each allele and the estimated number of each allele and their proportions.their proportions.AsCN = (normalization parameter)(Allele Intensity/Reference Intensity)AsCN = (normalization parameter)(Allele Intensity/Reference Intensity)

AsCN(max) = max(AsCN of A allele AsCN of B allele)AsCN(max) = max(AsCN of A allele AsCN of B allele)AsCN(max) = max(AsCN of A allele, AsCN of B allele)AsCN(max) = max(AsCN of A allele, AsCN of B allele)AsCN(min) = min(AsCN of A allele, AsCN of B allele)AsCN(min) = min(AsCN of A allele, AsCN of B allele)

Partek Incorporated, White Paper: Allele Specific Copy Number, St. Louis, 2009Partek Incorporated, White Paper: Allele Specific Copy Number, St. Louis, 2009

Page 11: The “Hidden Heritability” Problem

AlleleAllele--Specific Copy NumberSpecific Copy NumberAlleleAllele Specific Copy NumberSpecific Copy Number

Max Allele Min Allele Proportion Description

1 1 0 Normal

1 0 1 Loss of one allele (LOH)

1 0.5 0.33 Loss of one allele (LOH) in 50% mixed tissue

1 5 0 5 0 5 C t l LOH i 50% i d ti1.5 0.5 0.5 Copy neutral LOH in 50% mixed tissue

1.5 1 0.25 Possibly Polyploid

2 0 1 Copy neutral LOH

2 0.5 0.6 Possibly Polyploid

2 1 0.33 Gain of one allele

2 2 0 Amplification of both alleles

Partek Incorporated, White Paper: Allele Specific Copy Number, St. Louis, 2009Partek Incorporated, White Paper: Allele Specific Copy Number, St. Louis, 2009

Normal tissue contamination has a compression effect on total and alleleNormal tissue contamination has a compression effect on total and allele--specific copy number (AsCN).specific copy number (AsCN).

Page 12: The “Hidden Heritability” Problem

Advanced Tumor QCAdvanced Tumor QCAdvanced Tumor QCAdvanced Tumor QCAsCN suggests 25% mixed AsCN suggests 25% mixed tissue on ptissue on p arm and 50% on qarm and 50% on qtissue on ptissue on p--arm and 50% on qarm and 50% on q--arm. This difference is arm. This difference is indicative of intraindicative of intra--tumor tumor heterogeneity and normal heterogeneity and normal tissue contaminationtissue contaminationtissue contamination.tissue contamination.AsCN also suggests copy AsCN also suggests copy neutral LOH on qneutral LOH on q--arm while arm while LRR suggests an amplification. LRR suggests an amplification. Thi i i di ti f l l idThi i i di ti f l l idThis is indicative of polyploidy.This is indicative of polyploidy.Amplification on the qAmplification on the q--arm and arm and normality on pnormality on p--arm in LRR and arm in LRR and the BAF suggests that this the BAF suggests that this ggggchromosome is most likely a chromosome is most likely a mixture of trisomy and mixture of trisomy and tetrasomy.tetrasomy.

Page 13: The “Hidden Heritability” Problem

Advanced Tumor QCAdvanced Tumor QCAdvanced Tumor QCAdvanced Tumor QCAsCN suggests 50% mixed AsCN suggests 50% mixed ii ddtissue on ptissue on p--arm and qarm and q--arm.arm.

Deletion on pDeletion on p--arm in the LRR arm in the LRR and the signature of trisomy in and the signature of trisomy in the BAF indicates either thethe BAF indicates either thethe BAF indicates either the the BAF indicates either the presence of severe normal presence of severe normal tissue contamination or that tissue contamination or that the genome is largely the genome is largely tetraploid.tetraploid.Amplification on qAmplification on q--arm exhibits arm exhibits a BAF signature not a BAF signature not associated with disomyassociated with disomyassociated with disomy.associated with disomy.This sample is most likely This sample is most likely tetraploid.tetraploid.

Page 14: The “Hidden Heritability” Problem

Polyploidy CorrectionPolyploidy CorrectionPolyploidy CorrectionPolyploidy Correction

Because the normalization and transformation ofBecause the normalization and transformation ofBecause the normalization and transformation of Because the normalization and transformation of raw intensity data into R and Theta coordinates raw intensity data into R and Theta coordinates scales all signals to a diploid state globally, scales all signals to a diploid state globally, polyploid genomes are also rescaled to disomy.polyploid genomes are also rescaled to disomy. Rescale AsCN so that the anticipated normal number Rescale AsCN so that the anticipated normal number

f i f h ll l i h lf f th l b lf i f h ll l i h lf f th l b lof copies of each allele is half of the average global of copies of each allele is half of the average global ploidy.ploidy.

Feel free to shift log R ratios for aiding visualization, Feel free to shift log R ratios for aiding visualization, ee ee o s og a os o a d g sua a o ,ee ee o s og a os o a d g sua a o ,but usefulness is limited by small dynamic range of but usefulness is limited by small dynamic range of log ratio data.log ratio data.

Page 15: The “Hidden Heritability” Problem

Trisomy Corrected Trisomy Corrected –– Primary TumorPrimary Tumor

Chromosome 1, by itself, Chromosome 1, by itself, i t h t fii t h t fiis not enough to confirm is not enough to confirm trisomy, but with trisomy, but with chromosome 3, we now chromosome 3, we now have enough informationhave enough informationhave enough information have enough information to conclude the presence to conclude the presence of global trisomy with of global trisomy with ~50% normal tissue~50% normal tissue50% normal tissue 50% normal tissue contamination and intracontamination and intra--tumor heterogeneity.tumor heterogeneity.Examination of random Examination of random chromosomes across the chromosomes across the primary tumor genome primary tumor genome also confirms trisomy.also confirms trisomy.

Page 16: The “Hidden Heritability” Problem

Tetrasomy Corrected Tetrasomy Corrected –– Liver Metastatic Liver Metastatic TTTumorTumor

•Tetrasomy confirmed in the presence of intra-tumor heterogeneity

Page 17: The “Hidden Heritability” Problem

Copy Number DiscoveryCopy Number DiscoveryCopy Number DiscoveryCopy Number Discovery

For normal samples, one can use a HiddenFor normal samples, one can use a Hidden--For normal samples, one can use a HiddenFor normal samples, one can use a HiddenMarkov model based algorithm or a Markov model based algorithm or a segmentation based algorithm and produce segmentation based algorithm and produce nearly consistent results.nearly consistent results.Tumor samples do not have discrete states and Tumor samples do not have discrete states and are not easily quantified; therefore, use a are not easily quantified; therefore, use a segmentation algorithm to discover copy number segmentation algorithm to discover copy number aberrations and use the alleleaberrations and use the allele specific copyspecific copyaberrations and use the alleleaberrations and use the allele--specific copy specific copy number and BAF to quantify them. number and BAF to quantify them.

Page 18: The “Hidden Heritability” Problem

Segmentation ResultsSegmentation ResultsSegmentation ResultsSegmentation Results

Page 19: The “Hidden Heritability” Problem

Segmentation ResultsSegmentation ResultsSegmentation ResultsSegmentation Results

Page 20: The “Hidden Heritability” Problem

Segmentation Results Segmentation Results –– Chromosome 9Chromosome 9

Page 21: The “Hidden Heritability” Problem

Tumor Copy Number GenotypesTumor Copy Number GenotypesTumor Copy Number GenotypesTumor Copy Number Genotypes

Use the allelic imbalance dataset generated in Use the allelic imbalance dataset generated in ggthe allelethe allele--specific copy number workflow and specific copy number workflow and merge the data with the segmentation results.merge the data with the segmentation results.

Thi ill t b i llThi ill t b i ll This will report copy number regions as well as This will report copy number regions as well as regions with allelic imbalance and a combination of regions with allelic imbalance and a combination of the two.the two.

Report the smoothed mean max allele and min Report the smoothed mean max allele and min allele for each region reported.allele for each region reported.C t th b tC t th b tCompute the copy number genotypesCompute the copy number genotypes Max AsCN = max[A(CN)(1Max AsCN = max[A(CN)(1--BAF), B(CN)(BAF)]BAF), B(CN)(BAF)] Min AsCN = min[A(CN)(1Min AsCN = min[A(CN)(1--BAF) B(CN)(BAF)]BAF) B(CN)(BAF)] Min AsCN min[A(CN)(1Min AsCN min[A(CN)(1 BAF), B(CN)(BAF)]BAF), B(CN)(BAF)]

Page 22: The “Hidden Heritability” Problem
Page 23: The “Hidden Heritability” Problem

Thanks and ConsiderationsThanks and ConsiderationsThanks and ConsiderationsThanks and ConsiderationsDr. Seth Crosby and my colleagues at the Dr. Seth Crosby and my colleagues at the y y gy y gGenome Technology Access Center. Genome Technology Access Center. Illumina Inc. for allowing me to be here and for Illumina Inc. for allowing me to be here and for

i d ll di d ll dgreat customer service and excellent products.great customer service and excellent products.Elaine Mardis et al. of the Genome Center at Elaine Mardis et al. of the Genome Center at Wash U for letting me use their dataWash U for letting me use their dataWash. U for letting me use their data.Wash. U for letting me use their data.Division of Statistical Genomics for their support Division of Statistical Genomics for their support and collaboration over the years.and collaboration over the years.a d co abo at o o e t e yea sa d co abo at o o e t e yea sPartek Inc. of St. Louis for great customer Partek Inc. of St. Louis for great customer support and an excellent product.support and an excellent product.