QC & Analysis of Methylation Chip Data

Post on 16-Jan-2022

4 views 0 download

Transcript of QC & Analysis of Methylation Chip Data

QC&AnalysisofMethylationChipData

AllanMcRae&SoniaShah

OutlineforSession3Lecture

• EWASanalysis• Inflationintest-statistics• InterpretingEWASresults• Studydesign• Examples:Smoking,age,BMIandheight,ALS

Epigenome-wideassociationstudies

• IdentifieschangesinmethylationlevelsatsingleCpG sitesthatareassociatedwithhumanphenotype/disease

• Similartoanalysing SNPsinGWAS• AssociationanalysisbetweeneachCpG andphenotypeofinterest(~450,000associationanalyses)

• UnlikeSNPs,DNAmethylationmeasurementsconsideredasquantitativemeasure.• Linearorlogisticregression(forbinarydependentvariables)• Interpretationofeffectdependsonwhethermethylationisyourdependentorindependentvariable

𝐶𝑝𝐺𝑚𝑒𝑡ℎ~𝑠𝑚𝑜𝑘𝑖𝑛𝑔 + 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + 𝑃𝐶𝑠𝑑𝑖𝑠𝑒𝑎𝑠𝑒~𝐶𝑝𝐺𝑚𝑒𝑡ℎ + 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 + 𝑃𝐶𝑠

Visualising resultsmanhattan QQ

Inflationinlambda

Iterson etalGenomeBiology2017

ControllinginflationinEWAS

• Simulation study showing that the genomic inflation factor depends on the number of true associations

• genomic inflation factor commonly overestimates the true level of test-statistic inflation in EWAS and TWAS

ControllinginflationinEWAS

• http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1131-9 PublishedJan2017

• EWASsandTWASsarepronenotonlytosignificantinflationbutalsobiasoftheteststatistics

• NotproperlyaddressedbyGWAS-basedmethodology(i.e.genomiccontrol)orapproachestocontrolforunmeasuredconfounding(e.g.RUV,sva andcate).

• MethodtoestimatetheempiricalnulldistributionusingBayesianstatistics.

• http://bioconductor.org/packages/bacon/.

InterpretationofEWASmuchmorecomplicatedthanGWAS

Studydesignveryimportant

AdvantageofGWAS

• Genotypeisconstantfrombirth• Genotypecomesbeforephenotype• noissueofreversecausationi.e.phenotypedoesnotcausechangesingenotype.

• Geneticvariantsassumedtoberandomlyassignedwithrespecttothecharacteristicsofindividual,thereforeminimised confoundingbias

• Ascertainmentbias• Populationstratification(whichcanbecorrectedfor)

Fraga etal PNAS2005

Differences in global 5mC DNA content in monozygotic twins

Methylationisdynamic

Methylationisdynamic

http://ib.bioninja.com.au/_Media/methylation-factors_med.jpeg

Methylationduringdevelopment

Inuteroenvironmentandmethylation

Inuteroenvironmentandmethylation

• Dutchfaminestudy• TheDutchfaminestartedinNovember1944- May1945.• Rationswereaslowas400-800caloriesaday;lessthanaquarteroftherecommendedadultcaloricintake.

• BabieswhosemotherswentthroughtheDutchfamine• lowerbirthweights• increasedriskofcardiovasculardiseasesandotheradversehealthoutcomesinadulthood

Methylationistissueandcell-specific

Moststudiesdoneinbloodduetoeaseofsamplecollection

Methylationistissueandcell-specific

• Anytissuesuitableiftheepigeneticvariationispresentsoma-wide(e.g.ifinducedduringdevelopmentalreprogramminginearlyembryogenesis).

• Ifchangesthatoccurlaterinlife, alternativetissuesourcesneedtobeexplored

• Tissueheterogeneity- tissuesarecomposedofmultiplecelltypes(e.g.bloodcontains>50distinctcelltypes).

• Diseasestateitselfcanalsoaltercellcompositioninatissue(e.g.inflamedtissuevsnon-inflamedtissue)

Methylationcanbecausalorconsequential

• Methylationchangescanbedrivenbydiseasee.g.alterationsinwhitebloodcellproportionsinautoimmunedisordersoralteredmetabolicregulationintype2diabetes

BirneyetalPLOSGenetics2016

ConfoundinginEWAS

• Methylationmaybeaffectedbymanyconfoundingfactors:• Environmentalexposurese.g.smoking• Batcheffects• Ascertainmentbias• Populationstratification

• CouldadjustforPCsgeneratedfromGWASdataifavailableonthesameEWASsamples

• MethodssuchasSVAandPCAcanadjustforknown/unknownconfounders

Geneticvariantsalsoaffectonmethylation

• McRaeetal.2013GenomeBiology• InvestigatetheroleofgeneticheritabilityinthesimilarityofDNAmethylationbetweengenerations

• Familybasedsampleof614individualsfrom117familiesconsistingoftwinpairs,theirparentsandsiblings

• AfterremovingallprobesoverlappingSNPs(1000GEUR)averagegeneticheritabilitywas0.187

• Approximately20%ofindividualdifferencesinDNAmethylationinthepopulationarecausedbyDNAsequencevariationthatisnotlocatedwithinCpG sites

• SNPsassociatedwithmethylationlevelsoftopheritableprobes(mQTLs)

Methylationisdynamic

Studydesign

Rakyan etalNatRevGen2011

Studydesign

• Investigatingcausaleffectofenvironmentalexposureondiseaseoutcome

• 2-stepdesign• EWASofenvironmentalexposureinhealthyindividualstoidentifychangesinmethylationasaconsequenceofexposure

• Lookatwhethertheabovemethylationchangesareassociatedwithdiseaseinanindependentsample.

• Combinestudydesignse.g. adiscordantmonozygotic-twinstagefollowedbyalongitudinalcohortstage.

Studydesign

• Clearlydefinehypothesis• Understandingmechanismofdisease– mediatingcelltypewithhighpurity• Identifybiomarkerofexposureorpredictive/prognosis– useofanaccessiblecelltype/biologicalsample

• Canthestudydesignanswerthishypothesis• Understandanycellheterogeneityinyoursample• Effectsizeshouldbeevaluatedinthecontextoffunctionalandbiologicalrelevance.E.g.isamethylationdifferenceof1%largeenoughtohaveanimpactondisease?

• Integratedatawithgeneticandtranscriptomicdataonsameindividualstodeterminecausality

Validation

• Technicalvalidationusingdifferenttechnology- singlelocus–specificmethylationtechniquessuchasbisulfite(pyro)sequencing

• rulingouttechnicalerrorssuchascross-hybridising probesorunrecognisedSNPs

• BiologicalvalidationofEWASfindings- replicatingstudyresultsincomparablebutindependentsample

Criteriaforidentificationof'driver'methylationchanges

Michels etalNat.Methods2013

SummaryofEWASpublications

Michels etalNat.Methods2013

Example1:Smoking

• Zeilinger etal• 450Karray• Discoverysample: discovery (current N = 262, never N = 749)• Replication (current N = 236, never N = 232)• 972CpG siteswithdifferentialmethylationlevelsafterBonferronicorrection(p≤1E-07)

• 187CpG sitesreplicated

Zeilinger etal.PLOSONE2013

SmokingEWASTop hit cg05575921

Effect in current smokers vs never smokers• Discovery:–24.40%, p = 2.54E-182, explained variance = 41.02%;

• Replication:–23.29%, p = 1.81E-64, explained variance = 39.69%),

located within the AHRR gene (chr5)

cg05575921methylationlevels

Predictionofsmokingstatus

Longitudinalanalysisofsmoking

TwodistinctclassesofCpG sitesidentified:

• siteswhosemethylationrevertstolevelstypicalofneversmokerswithindecadesaftersmokingcessation

• sitesremainingdifferentiallymethylated,evenmorethan35yearsaftersmokingcessation.

Example2:Age

• HorvathGenomeBiology 2013• Identifyage-associatedCpGs inatrainingsetusingapenalizedregressionmodel(elasticnet)

• Identified353CpGs• Predictedageinindependentsamplesandmultipletissues

HorvathGenomeBiology 2013

Example2:Age

PredictionofageusingHorvathCpGs inaChinesecohort

Methylationagecalculator

• https://labs.genetics.ucla.edu/horvath/dnamage/

Example3:BMIandheight

• ShahetalAmericanJournalofHumanGenetics2014• DiscoveryofBMI-associatedCpGs in2independentsamples(LBCandLifelines)

• GenerategeneticriskscoresfromBMIGWASSNPsanddetermineifgeneticriskscoreandmethylationriskscoresareindependentlyassociatedwithBMI

• Repeatforheight.

Methods

• StudyA(populationcohort)– EWASonBMI->significantprobelist A• StudyB(oldindividuals70+)– EWASonBMI->significantprobelist B• CalculatemethylationBMIriskscoreinstudyAbasedonprobelist B• CalculatemethylationBMIriskscoreinstudyBbasedonprobelist A• ProportionofvarianceinBMIexplainedbymethylationscoreineachstudy• GenerategeneticscoresforBMIineachstudyusingSNPsidentifiedfromthelargestBMIGWAS(GIANTconsortium)

• Lookatproportionofvarianceexplainedbygeneticriskscore• AremethylationandgeneticriskscoresindependentlyassociatedwithBMIandheight

Example3:BMIandheight

Example4:C9orf72repeatexpansion

• hexanucleotide repeatexpansionGGGGCC

• 1st Intronregionofc9orf72• Mostcommonmutationidentified

thatisassociatedwithfamilialFTDand/orALS (5–20%ofpatientswithsporadicALS)

• Lengthofrepeatincasescanoccurintheorderof100sandvaries

• <30repeatsgenerallynotassociatedwithdisease

Determiningcausality

Mendelianrandomisation

G must be associated with intermediate phenotype PA

G mustnotbeassociatedwithconfounders.

Gshouldonlyberelatedtotheoutcome PB viaPA

Doesgenotypeaffectphenotypeviachangesinmethylation?

• InstrumentalvariableanalysisorMendelianrandomisation analysis• Step1:IsthereaSNP(notintheprobe)thatisstronglyassociatedwithmethylationlevels(mQTL)

• Step2:CpGmeth ~SNP• Step3:BMI~predictedCpGmeth