Transethnic genetic correlation estimates from summary ... · 2/23/2016 · Consortium, Chun...

1

Transethnicgeneticcorrelationestimatesfromsummarystatistics

BrielinC.Brown,AsianGeneticEpidemiologyNetwork-Type2Diabetes(AGEN-T2D)Consortium,ChunJimmieYe,AlkesL.Price,NoahZaitlen

Abstract

Theincreasingnumberofgeneticassociationstudiesconductedinmultiple

populationsprovidesunprecedentedopportunitytostudyhowthegeneticarchitectureof

complexphenotypesvariesbetweenpopulations,aproblemimportantforbothmedicaland

populationgenetics.Herewedevelopamethodforestimatingthetransethnicgenetic

correlation:thecorrelationofcausalvarianteffectsizesatSNPscommoninpopulations.We

takeadvantageoftheentirespectrumofSNPassociationsanduseonlysummary-level

GWASdata.Thisavoidsthecomputationalcostsandprivacyconcernsassociatedwith

genotype-levelinformationwhileremainingscalabletohundredsofthousandsof

individualsandmillionsofSNPs.Weapplyourmethodtogeneexpression,rheumatoid

arthritis,andtype-twodiabetesdataandoverwhelminglyfindthatthegeneticcorrelationis

significantlylessthan1.Ourmethodisimplementedinapythonpackagecalledpopcorn.

Introduction

Manycomplexhumanphenotypesvarydramaticallyintheirdistributionsbetween

populationsduetoacombinationofgeneticandenvironmentaldifferences.Forexample,

northernEuropeansareonaveragetallerthansouthernEuropeans1andAfricanAmericans

haveanincreasedrateofhypertensionrelativetoEuropeanAmericans2.Thegenetic

contributiontopopulationphenotypicdifferentiationisdrivenbydifferencesincausal

allelefrequencies,effectsizes,andgeneticarchitectures.Understandingtherootcausesof

phenotypicdifferencesworldwidehasprofoundimplicationsforbiomedicalandclinical

practiceindiversepopulations,thetransferabilityofepidemiologicalresults,aidingmulti-

ethnicdiseasemapping3,4,assessingthecontributionofnon-additiveandrarevariant

effects,andmodelingthegeneticarchitectureofcomplextraits.Inthisworkweconsidera

centralquestionintheglobalstudyofphenotype:dogeneticvariantshavethesame

phenotypiceffectsindifferentpopulations?

WhilethevastmajorityofGWAShavebeenconductedinEuropeanpopulations5,

thegrowingnumberofnon-Europeanandmulti-ethnicstudies4,6,7provideanopportunity

tostudygeneticeffectdistributionsacrosspopulations.Forexample,onerecentstudyused

mixed-modelbasedmethodstoshowthatthegenome-widegeneticcorrelationof

.CC-BY-NC-ND 4.0 International licenseunder anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

The copyright holder for this preprint (which wasthis version posted February 23, 2016. ; https://doi.org/10.1101/036657doi: bioRxiv preprint

https://doi.org/10.1101/036657http://creativecommons.org/licenses/by-nc-nd/4.0/

2

schizophreniabetweenEuropeanandAfricanAmericansisnonzero8.Whilepowerful,

computationalcostsandprivacyconcernslimittheutilityofgenotype-basedmethods.In

thiswork,wemaketwosignificantcontributionstostudiesoftransethnicgenetic

correlation.First,weexpandthedefinitionofgeneticcorrelationtobetteraccountfora

transethniccontext.Second,wedevelopanapproachtoestimatinggeneticcorrelation

acrosspopulationsthatusesonlysummarylevelGWASdata.Similartootherrecent

summarystatisticsbasedmethods9–20,ourapproachsupplementssummaryassociation

datawithlinkagedisequilibrium(LD)informationfromexternalreferencepanels,avoids

privacyconcerns,andisscalabletohundredsofthousandsofindividualsandmillionsof

markers.UnliketraditionalapproachesthatfocusonthesimilarityofGWASresults21–25we

utilizetheentirespectrumofGWASassociationswhileaccountingforLDinordertoavoid

filteringcorrelatedSNPs.

Inasinglepopulation,thegeneticcorrelationoftwophenotypesisdefinedasthe

correlationcoefficientofSNPeffectsizes18,26.Inmultiplepopulations,differencesinallele

frequencymotivatemultiplepossibledefinitionsofgeneticcorrelation.Becauseavariant

mayhaveahighereffectsizebutlowerfrequencyinonepopulation,weconsiderboththe

correlationofalleleeffectsizesaswellasthecorrelationofallelicimpact.Wedefinethe

transethnicgeneticeffectcorrelation(ρge,previouslydefinedbyLeeetal26andimplemented

inGCTA)asthecorrelationcoefficientoftheper-alleleSNPeffectsizes,andthetransethnic

geneticimpactcorrelation(ρgi)asthecorrelationcoefficientofthepopulation-specificallele

variancenormalizedSNPeffectsizes.

Intuitively,thegeneticeffectcorrelationmeasurestheextenttowhichthesame

varianthasthesamephenotypicchange,whilethegeneticimpactcorrelationgivesmore

weighttocommonallelesthanrareonesseparatelyineachpopulation.Considerthecaseof

aSNPthatisrareinpopulation1butcommoninpopulation2,andhasanidenticaleffect

sizeinbothpopulations.Inthiscase,thecorrelationofeffectsizes(thegeneticeffect

correlationρge)is1,butthisprovidesanincompletepictureoftherelationshipbetweenthe

twopopulations,astheallelehasamuchbiggerimpactonthedistributionofthephenotype

inpopulation2.Therefore,wedefinethegeneticimpactcorrelationρgiasthecorrelationof

effectsizesafternormalizinggenotypestohavemean0andvariance1.Inourhypothetical

caseρgi

3

twopopulationswillbesimilarandthereforeρgiwillbecloseto1.Whileotherdefinitionsof

thegeneticcorrelationarepossible(seediscussion),thesequantitiescapturetwoimportant

questionsaboutthestudyofdiseaseinmultiplepopulations:towhatextentdothesame

mutationsinmultiplepopulationsdifferintheirphenotypiceffects?And,towhatextentare

thesedifferencesmitigatedorexacerbatedbydifferencesinallelefrequency?

Toestimategeneticcorrelation,wetakeaBayesianapproachwhereinweassume

genotypesaredrawnseparatelyfromwithineachpopulationandeffectsizeshaveanormal

prior(theinfinitesimalmodel27).Whileunlikelytorepresentreality,thismodelhasbeen

usedsuccessfullyinpractice8,16,17,28,29.Theinfinitesimalassumptionyieldsamultivariate

normaldistributionontheobservedteststatistics(Z-scores),whichisafunctionofthe

heritabilityandgeneticcorrelation.RatherthanpruningSNPsinLD10,30,31,thisallowsusto

explicitlymodeltheresultinginflationofZ-scores.Wethenmaximizeanapproximate

weightedlikelihoodfunctiontofindtheheritabilityandgeneticcorrelation.Thismethodis

implementedinapythonpackagecalledpopcorn.Thoughderivedforquantitative

phenotypes,popcornextendseasilytobinaryphenotypesundertheliabilitythreshold

model.Weshowviaextensivesimulationthatpopcornproducesunbiasedestimatesofthe

geneticcorrelationandthepopulationspecificheritabilities,withastandarderrorthat

decreasesasthenumberofSNPsandindividualsinthestudiesincreases.Furthermore,we

showthatourapproachisrobusttoviolationsoftheinfinitesimalassumption.

WeapplypopcorntoEuropeanandYorubangeneexpressiondata32aswellasGWAS

summarystatisticsfromEuropeanandEastAsianrheumatoidarthritisandtype-two

diabetescohorts,33,34.OuranalysisofgEUVADISshowsthatoursummarystatisticbased

estimatorisconcordantwiththemixedmodelbasedestimator.Wefindthatthemean

transethnicgeneticcorrelationacrossallgenesislow(ρge=0.320(0.009)),butincreases

substantiallywhenthegeneishighlyheritableinbothpopulations(ρge=0.772(0.017)).in

RAandT2D,wefindthegeneticeffectcorrelationtobe0.463(0.058)and0.621(0.088),

respectively.

Acrossallphenotypesconsidered,weoverwhelminglyfindthatthetransethnic

geneticcorrelationissignificantlylessthanone.Thisobservationhighlightstheneedto

studyphenotypesinmultiplepopulationsasitimpliesthat,uptotheeffectsofun-observed

variants,effectsizesatcommonSNPstendtodifferbetweenpopulations.Thisindicates

thatGWASresultsmaynottransferbetweenpopulations,andthereforediseaserisk

predictioninnon-EuropeansbasedoncurrentGWASresultsmaybeproblematic,




4

necessitatingamulti-populationapproachtogaininsightintointer-populationdifferences

inthegeneticarchitectureofcomplextraits.

Methods

Ourmethodtakesasinputsummaryassociationstatisticsfromtwostudiesofa

phenotypeintwodifferentpopulations,alongwithtwosetsofreferencegenotypeseach

matchingoneofthepopulationsinthestudy.Ourmethodhastwosteps:first,weestimate

thediagonalelementsoftheLDmatrixproductsΣ12,Σ22andΣ1Σ2,then,usingthese

estimates,wefindthemaximumlikelihoodvaluesandestimatestandarderrorsofthe

parametersofinterest:h12,h22andρgeorρgi.Thedetailsfollow.

ConsiderGWASofaphenotypeconductedintwodifferentpopulations.Assumewe

haveN1individualsgenotypedonMSNPsinstudyoneandN2individualsgenotypedonthe

sameSNPsstudytwo.LetX1,X2bethematricesofmean-centeredgenotypesinstudyone

andstudytwo,respectively,andletY1,Y2betheirnormalizedphenotypes.Letf1,f2be

vectorsoftheallelefrequenciesoftheMSNPscommontobothpopulations.Assuming

Hardy-Weinbergequilibriumwithineachpopulationseparately,theallelevariancesareσ12

=2f1(1–f1),σ22=2f2(1–f2).Letβ1,β2bethe(unobserved)per-alleleeffectsizesforeachSNP

instudiesoneandtwo,respectively.Theheritabilityinstudyoneisthenh12=Σiσ1i2β1i2

(andlikewiseforstudytwo).Theobjectiveofthisworkistoestimatetransethnicgenetic

correlationfromsummarystatisticsofcommonvariants (andlikewise

forstudytwo)andestimatesofpopulationLDmatrices(Σ1andΣ2)fromexternalreference

panels.Definethegeneticeffectcorrelationρge=Cor(β1,β2)andthegeneticimpact

correlationρgi=Cor(σ1β1,σ2β2).

Weassumethegenotypesaredrawnrandomlyfromeachpopulationandthat

phenotypesaregeneratedbythelinearmodelY1=X1β1+ε1(likewiseforphenotypetwo).

Wheneffectsizesβareassumedinverselyproportionaltoallelefrequency,asiscommonly

done16,29,weshow(Appendix)thatunderthelinearinfinitesimalgeneticarchitecture,the

jointdistributionoftheZ-scoresfromeachstudyisasymptoticallymultivariatenormalwith

mean andvariance:

(1)

Z1 =(X1/�1)>Y1p

N1

�!0

Var(Z) =

"⌃1 +

N1+1M h

21⌃

21 ⇢gi

ph21h

22

pN1N2M ⌃1⌃2

⇢gip

h21h22

pN1N2M ⌃2⌃1 ⌃2 +

N2+1M h

22⌃

22

#




5

However,wheneffectsizesareassumedindependentofallelefrequencyweshow:

(2)

Giventheseequationsforvariance,thequantitiesρgiorρgeandh12,h22canbeestimatedby

maximizingthemultivariatenormallikelihood,

,whereCiseitheroftheabovecovariance

matrices(1)and(2).BecauseΣ1andΣ2areestimatedfromfiniteexternalreferencepanels,

maximumlikelihoodestimationoftheabovemultivariatenormalleadstoover-fitting.We

employtwooptimizationstoavoidthisproblem.First,wemaximizeanapproximate

weightedlikelihoodthatusesonlythediagonalelementsofeachblockofVar(Z).This

allowsustoaccountfortheLD-inducedinflationoftestsstatistics,butdiscardscovariance

informationbetweenpairsofZ-scores,andthereforeleadstoover-countingZ-scoresof

SNPsinhighLD.Tocompensateforthis,wedownweightZ-scoresofSNPsinproportion

theirLD.Second,ratherthancomputethefullproductsΣ12,Σ22andΣ1Σ2overallMSNPsin

thegenome,wechooseawindowsizeWandapproximatetheproductby

.TheseoptimizationsaresimilartothoseemployedbyLDscore

regression16.Thefulldetailsofthederivationandoptimizationareprovidedinthe

appendix.

Results

SimulatedGenotypesandSimulatedPhenotypes

Wesimulated50,000European-like(EUR)and50,000EastAsian-like(EAS)

individualsat248,953SNPsfromchromosomes1-3withallelefrequencyabove1%inboth

EuropeanandEastAsianHapMap3populationswithHapGen235.HapGen2implementsa

haplotyperecombinationwithmutationmodelthatresultsinexcesslocalrelatedness

amongthesimulatedindividuals.Toaccountforthislocalstructure,weusedPlink236to

filterindividualswithgeneticrelatednessabove0.05,resultingin4499EUR-likeindivudals

and4837EAS-likeindividuals.Fromthesesimulatedindividuals,500perpopulationwere

chosenuniformlyatrandomtoserveasanexternalreferencepanelforestimatingΣ1andΣ2.

l(⇢g{i,e}, h21, h

22|Z,⌃,�) / �ln(|C|)� Z>C�1Z

(⌃a⌃b)ii =w=i+WX

w=i�Wraiwrbiw

Var(Z) =

2

4⌃1 +

N1+1k�21k1

h21⌃1�21⌃1 ⇢ge

ph21h

22

pN1N2p

k�21k1k�22k1⌃1

p�21�

22⌃2

⇢gep

h21h22

pN1N2p

k�21k1k�22k1⌃2

p�22�

21⌃1 ⌃2 +

N2+1k�22k1

h22⌃2�22⌃2

3

5




6

Ineachsimulationeffectsizesweredrawnfroma“spikeandslab”model,where

withprobabilitypand with

probability1-p.ρgiwasanalyticallycomputedfromthesimulatedeffectsizesandallele

frequenciesinthesimulatedreferencegenotypes.Quantitativephenotypesweregenerated

underalinearmodelwithi.i.d.noiseandnormalizedtohavemean0andvariance1,while

binaryphenotypesweregeneratedunderaliabilitythresholdmodelwhereindividualsare

labeledcaseswhentheirliabilityexceedsathreshold ,withKthe

populationdiseaseprevalence37.

Wevariedh12,h22,ρge,andρgi,aswellasthenumberofindividualsineachstudy(N1,

N2),thenumberofSNPs(M),thepopulationprevalenceK,andproportionofcausalvariants

(p)inthesimulatedGWASandgeneratedsummarystatisticsforeachstudy.Theresults

showninFigure1andFigureS1demonstratethattheestimatorsarenearlyunbiasedasthe

geneticcorrelationandheritabilitiesvary.Furthermore,byvaryingtheproportionofcausal

variantspweshowthatourestimatorisrobusttoviolationsoftheinfinitesimalassumption

(FigureS2).InfigureS3,weshowthatthestandarderroroftheestimatordecreasesasthe

numberofSNPsandindividualsinthestudyincreases.Finally,weshowinTableS1thatour

estimatesoftheheritabilityofliabilityincasecontrolstudiesarenearlyunbiased.

Simulationswithnonstandarddiseasemodels

Ourapproach,aswellasgenotype-basedmethodssuchasGCTA,makes

assumptionsaboutthegeneticarchitectureofcomplextraits.Previousworkhasshownthat

violationsoftheseassumptionscanleadtobiasinheritabilityestimation38,thereforewe

soughttoquantifytheextentthatthisbiasmayeffectourestimates.Wesimulated

phenotypesundersixdifferentdiseasemodels.Independent:effectsizeindependentof

allelefrequency.Inverse:effectsizeinverselyproportionaltoallelefrequency.Rare:only

SNPswithallelefrequencyunder10%affectthetrait.Common:onlySNPswithallele

frequencybetween40%and50%affectthetrait.Difference:effectsizeproportionalto

differenceinallelefrequency.Adversarial:differencemodelwithsignofbetasettoincrease

thephenotypeinthepopulationwherethealleleismostcommon.Additionalgenetic

architecturesarepossible,includingoneswhereeffectsizesarenotadirectfunctionof

MAF39.

�1i,�2i ⇠ N✓0,

h21 ⇢ge

ph21h

22

⇢geph21h

22 h

22

�◆

�1i,�2i = (0, 0)

⌧ = ��1(1�K)




7

Wesimulatedphenotypesusinggenotypeswithallelefrequencyabove1%or5%

andcomparedthetrueandestimatedgeneticimpactandeffectcorrelationamongall

models(Table1).WefindthatwhenonlySNPswithfrequencyabove5%inboth

populationsareused,thedifferenceinρgeandρgiisminimalexceptinthemostadversarial

cases.Evenintheadversarialmodel,thetruedifferenceisonly7%.Thoughunlikelyto

representreality,thefournonstandarddiseasemodelsresultinsubstantialbiasinour

estimators.WhenSNPswithallelefrequencyabove1%inbothpopulationsareincluded,

thedifferencesaremorepronounced.Thisisbecausethenormalizingconstant1/σrapidly

increasesastheSNPbecomesmorerare.Indeed,asSNPsbecomemorerarehavingan

accuratediseasemodelbecomesincreasinglyimportant.Thereforeweproceedwitha5%

MAFcutoffinouranalysisofrealdata,andusethenotationhc2torefertotheheritabilityof

SNPswithallelefrequencyabove5%inbothpopulations(thecommon-SNPheritability).

Note,however,thatoneoftheadvantagesofmaximumlikelihoodestimationingeneralis

thatthelikelihoodcanbereformulatedtomimicthediseasemodelofinterest.

ValidationofPopcornusinggeneexpressioninGEUVADIS

Wecomparedthecommon-SNPheritability(hc2)andgeneticcorrelationestimates

ofpopcorntoGCTAinthegEUVADISdatasetforwhichrawgenotypesarepubliclyavailable.

gEUVADISconsistsofRNA-seqdatafor464lymphoblastoidcellline(LCL)samplesfrom

fivepopulationsinthe1000genomesproject.Ofthese,375areofEuropeanancestry(CEU,

FIN,GBR,TSI)and89areofAfricanancestry(YRI).RawRNA-sequencingreadsobtained

fromtheEuropeanNucleotideArchivewerealignedtothetranscriptomeusingUCSC

annotationsmatchinghg19coordinates.RSEMwasusedtoestimatetheabundancesofeach

annotatedisoformandtotalgeneabundanceiscalculatedasthesumofallisoform

abundancesnormalizedtoonemilliontotalcountsortranscriptspermillion(TPM).For

eQTLmapping,CaucasianandYorubansampleswereanalyzedseparately.Foreach

population,TPMsweremedian-normalizedtoaccountfordifferencesinsequencingdepth

ineachsampleandstandardizedtomean0andvariance1.Ofthe29763totalgenes,9350

withTPM>2inbothpopulationswerechosenforthisanalysis.

Foreachgene,weconductedacis-eQTLassociationstudyatallSNPswithin1

megabaseofthegenebodywithallelefrequencyabove5%inbothpopulationsusing30

principalcomponentsascovariates.WefoundthatGCTAandpopcornagreeontheglobal

distributionofheritability(FigureS3),andthatGCTA’sestimatesofgeneticcorrelation




8

haveasimilardistributiontopopcorn’sgeneticeffectandgeneticimpactcorrelation

estimates(Figure2).WhilethenumberofSNPsandindividualsincludedineachgene

analysisaretoosmalltoobtainaccuratepointestimatesofthegeneticcorrelationonaper-

genebasis(N=464,M=4279.5),thelargenumberofgenesenablesaccurateestimationof

theglobalmeanheritabilityandgeneticcorrelation.

Common-SNPheritabilityandgeneticcorrelationofgeneexpressioningEUVADIS

Wefindthattheaveragecis-hc2oftheexpressionofthegenesweanalyzedwas

0.093(0.002)inEURand0.088(0.002)inYRI.Ourestimatesarehigherthanpreviously

reportedaveragecis-heritabilityestimatesof0.055inwholebloodand0.057inadipose40,

whichcouldariseforseveralreasons.First,weremove68%ofthetranscriptsthatare

lowlyexpressedineithertheYRIorEURdata.Second,estimatesfromRNA-seqanalysisof

celllinesmightnotbedirectlycomparabletomicroarraydatafromtissue.

Theaveragegeneticeffectcorrelationwas0.320(0.010),whiletheaveragegenetic

impactcorrelationwas0.313(0.010).Notably,thegeneticcorrelationincreasesasthecis-

hc2ofexpressioninbothpopulationsincreases(Figure3).Inparticular,whenthecis-hc2of

thegeneisatleast0.2inbothpopulationsthegeneticeffectcorrelationwas0.772(0.017)

whilethegeneticimpactcorrelationwas0.753(0.018).

Inordertoverifythattherewerenosmall-samplesizeorconditioningbiasesinour

analysis,weanalyzedthegeneticcorrelationofsimulatedphenotypesoverthegEUVADIS

genotypes.Wesampledpairsofheritabilitiesfromtheestimatedexpressionheritability

distributionandsimulatedpairsofphenotypestohavethegivenheritabilityandagenetic

effectcorrelationof0.0overrandomlychosen4000baseregionsfromchromosome1ofthe

gEUVADISgenotypes.Withoutconditioning,theaverageestimatedgeneticeffect

correlationwas-0.002(0.003),indicatingthattheestimatorremainedunbiased.

Furthermore,theaverageestimatedgeneticeffectcorrelationwasnotsignificantlydifferent

from0.0conditionalontheestimatesofheritabilitybeingaboveacertainthreshold(Figure

S4).

Wefindthatwhiletheaveragegeneticcorrelationislow,thegeneticcorrelation

increaseswiththecis-hc2ofthegene,indicatingthatascis-geneticregulationofgene

expressionincreasesitdoessosimilarlyinbothYRIandEURpopulations.Thismayhelp

interprettherecentobservationthatwhiletheglobalgeneticcorrelationofgeneexpression

acrosstissuesislow40,cis-eQTL’stendtoreplicateacrosstissues41.Asthepresenceofacis-




9

eQTLindicatessubstantialcis-geneticregulation,ananalysisofeQTLreplicationacross

tissuesisimplicitlyconditioningontheheritabilityofgeneexpressionbeinghighand

thereforemayindicatemuchhighergeneticcorrelationthanaverage.

SummarystatisticsofRAandT2D

Finally,wesoughttoexaminethetransethnicρgiandρgeinRAandT2Dcohortsfor

whichrawgenotypesarenotavailable.WeobtainedsummarystatisticsofGWASfor

rheumatoidarthritisandtype-2diabetesconductedinEuropeanandEastAsian

populations.Weusedgenotypesfrom504EastAsianand503Europeanindividuals

sequencedaspartofthe1000genomesprojectaspopulation-specificexternalreference

panelsforourEASandEURsummarystatistics,respectively.WeremovedtheMHCregion

(chromosome6,25–35Mb)fromtheRAsummarystatistics.Weestimatedthecommon-

SNPheritabilityandgeneticcorrelationusing2,539,629SNPsgenotypedorimputedinboth

RAstudiesand1,054,079SNPsgenotypedorimputedinbothT2Dstudieswithallele

frequencyabove5%in1000genomesEURandEASpopulations.Thehc2andgenetic

correlationestimatesarepresentedinTable2.OurRAhc2estimatesof0.177(0.015)and

0.221(0.026)forEURandEAS,respectively,arelowerthanapreviouslyreportedmixed-

modelbasedheritabilityestimatesof0.32(0.037)inEuropeans42.Similarly,ourT2Dhc2

estimatesof0.242(0.013)and0.105(0.021)forEURandEAS,respectively,arelowerthan

apreviouslyreportedmixed-modelbasedestimateof0.51(0.065)inEuropeans42.We

stressthatthisdiscrepancyislikelyduetothedifferencebetweencommon-SNPheritability

hc2andtotalnarrow-senseheritabilityhg2.Furthermore,estimatesoftheheritabilityofT2D

fromfamilystudiescanvarysignificantly43,44.

WefindthegeneticeffectcorrelationinRAandT2Dtobe0.463(0.058)and0.621

(0.088),respectively,whilethegeneticimpactcorrelationisnotsignificantlydifferentat

0.455(0.056)and0.606(0.083).Thetransethnicgeneticimpactandeffectcorrelationfor

bothphenotypesaresignificantlydifferentfromboth1and0(Table2),showingthatwhile

thereiscleargeneticoverlapbetweenthephenotypes,theperalleleeffectsizesdiffer

significantlybetweenthetwopopulations.

Discussion

Wehavedevelopedthetransethnicgeneticeffectandgeneticimpactcorrelationand

providedanestimatorforthesequantitiesbasedonlyonsummary-levelGWASinformation




10

andsuitablereferencepanels.Wehaveappliedourestimatortoseveralphenotypes:

rheumatoidarthritis,type-2diabetesandgeneexpression.WhilethegEUVADISdataset

lacksthepowerrequiredtomakeinferencesaboutthegeneticcorrelationofsingleorsmall

subsetsofgenes,wecanmakeinferencesabouttheglobalstructureofgeneticcorrelationof

geneexpression.Wefindthattheglobalmeangeneticcorrelationislow,butthatit

increasessubstantiallywhentheheritabilityishighinbothpopulations.Inallphenotypes

analyzed,thegeneticcorrelationissignificantlydifferentfromboth0and1.Ourresults

showthatglobaldifferencesinSNPeffectsizeofcomplextraitscanbelarge.Incontrast,

effectsizesofgeneexpressionappeartobemoreconservedwherethereisstronggenetic

regulation.

Itisnotpossibletodrawconclusionsaboutpolygenicselectionfromestimatesof

transethnicgeneticcorrelation.Theeffectsizesmaybeidentical(!!" = 1)whilepolygenicselectionactstochangeonlytheallelefrequencies.Similarly,theeffectsizesmaybe

different(!!" < 1)withoutselection.DifferencesineffectsizesatcommonSNPscanresultfrommanyphenomena.Weexpectun-typedandun-imputedvariantsdifferentiallylinked

toobservedSNPstocontributesignificantly,alongwithrareorpopulation-specificvariants

differentiallylinkedtoobservedSNPs.Ifagene-geneorgene-environmentinteraction

exists,butonlymarginaleffectsaretested,theobservedmarginaleffectsmaybedifferentin

eachpopulationduetoallelefrequencydifferenceseveniftheinteractioneffectisthesame

inbothpopulations,andthiswillresultindecreasedgeneticcorrelation.Whilewithin-locus

(dominance)interactionsmayalsoplayarole45,themagnitudeofthiseffecthasbeen

debated46.Weemphasizethatwecannotdifferentiatebetweentheseeffectsonthebasisof

thisanalysisalone,andfurtherresearchisrequiredtoestablishthemagnitudeofthe

contributionofeachoftheseeffectstointer-populationeffectsizedifferences.

Estimatesofthetransethnicgeneticcorrelationareimportantforseveralreasons.

Theymayhelpinformbestpracticesfortransethnicmeta-analysis,potentiallyoffering

improvementsovercurrentmethodsthatuseFsttoclusterpopulationsforanalysis4.

Further,thetransethnicgeneticcorrelationconstrainsthelimitofoutofsamplephenotype

predictivepower.IfthemaximumwithinpopulationcorrelationofpredictedphenotypeP

totruephenotypeYis ,thenthemaximumoutofpopulationcorrelationis

(Appendix).OurobservationthatforRA,T2D,andgeneexpressionthe

geneticcorrelationislowshowsthatoutofpopulationphenotypicpredictivepowerisquite

⇢maxY P

=q

h21

⇢maxY P

= ⇢ge

qh21




11

low.Similarly,itimpliesthatdiseaseriskassessmentinnon-Europeansbasedoncurrent

GWASresultsmaybeproblematic,necessitatingincreasedstudyofdiseaseinmany

populationstogaininsightintodifferencesingeneticarchitectureandimproverisk

assessment.

Whilethegeneticcorrelationofmultiplephenotypesinonepopulationhasa

relativelystraightforwarddefinition,extendingthistomultiplepopulationsmotivates

multiplepossibleextensions.Inthisworkwehaveprovidedestimatorsforthecorrelation

ofgeneticeffectandgeneticimpactbutotherquantitiesrelatedtothesharedgeneticsof

complextraitsbetweenpopulationsincludethecorrelationofvarianceexplained

andproportionofsharedcausalvariantsbetweenthetwo

populations.Interestingly,whileourgoalwastoconstructanestimatorthatdeterminedthe

extentofgeneticsharingindependentofallelefrequency,weobservethatthecorrelationof

geneticeffectandgeneticimpactaresimilar.Furthermore,oursimulationsshowthatunder

arandomeffectsmodelutilizingonlySNPswithallelefrequencyabove5%inboth

populationsthetruegeneticeffectandgeneticimpactcorrelationaresimilar.Weconclude

thatatvariantscommoninbothpopulations,differencesineffectsizeandnotallele

frequencyaredrivingthetransethnicphenotypicdifferencesinthesetraits.

Ourapproachtoestimatinggeneticcorrelationhastwomajoradvantagesover

mixed-modelbasedapproaches.First,utilizingsummarystatisticsallowsapplicationofthe

methodwithoutdata-sharingandprivacyconcernsthatcomewithrawgenotypes.Second,

ourapproachislinearinthenumberofSNPsavoidingthecomputationalbottleneck

requiredtoestimatethegeneticrelationshipmatrix.Conceptually,ourapproachisvery

similartothattakenbyLDscoreregression.Indeed,thediagonaloftheLDmatrixproduct

inonepopulationareexactlytheLD-scores( ).Onecouldignoreourlikelihood-

basedapproachanddefinecross-populationscores inordertoexploitthe

linearrelationship (asimilarapproachcanbetakenfor

thegeneticeffectcorrelation).SinceLD-scoreregressionhasbeensuccessfullyusedto

computethegeneticcorrelationoftwophenotypesinasinglepopulation,thisderivation

canbeviewedasanextensionofLD-scoreregressiontoonephenotypeintwodifferent

populations.Themaindifferenceinourapproachischoosingmaximumlikelihoodrather

⇢ge = Cor(�21�

21 ,�

22�

22)

⌃21ii = li

ci =X

m

r1imr2im

E[Z1iZ2i] =pN1N2M

⇢gi

qh21h

22ci




12

thanregressioninordertofitthemodel.Acomparisonofourmethodtotheldscsoftware

showstheyperformsimilarlyasheritabilityestimators(FigureS5).

Ofcourse,ourmethodisnotwithoutdrawbacks.First,itrequiresalargesample

sizeandlargenumberofSNPstoachievestandarderrorslowenoughtogenerateaccurate

estimates.UntilrecentlylargesampleGWAShavebeenrareinnon-Europeanpopulations,

thoughtheyarebecomingmorecommon.Similarly,referencepanelqualitymaysufferin

non-Europeanpopulationsandthismayimpactdownstreamanalysis47.Second,itislimited

toanalyzingrelativelycommonSNPs,bothbecausehavinganaccuratediseasemodelis

importantfortheanalysisofrarevariantsandbecauseeffectsizeandcorrelationcoefficient

estimateshaveahighstandarderroratrareSNPs16.Third,ouranalysisiscurrentlylimited

toSNPsthatarepresentinbothpopulations.Indeeditiscurrentlyunclearhowbestto

handlepopulation-specificvariantsinthisframework.Fourth,ourestimatorofρis

boundedbetween-1and1.Thismayinducebiaswhenthetruevalueisclosetothe

boundaryandthesamplesizeissmall.Finally,admixedpopulationsinduceverylong-range

LDthatisnotaccountedforinourapproachandwearethereforelimitedtoun-admixed

populations16.

Ouranalysisleavesopenseveralavenuesforfuturework.Weapproximately

maximizethelikelihoodofan!×!multivariatenormaldistributionviaamethodthatusesonlythediagonalelementsofeachblock,discardingcovarianceinformationbetweenZ-

scores.Abetterapproximationmaylowerthestandarderroroftheestimator,facilitating

ananalysisofthegeneticcorrelationoffunctionalcategories,pathwaysandgeneticregions.

Wewouldalsoliketoextendouranalysistoincludepopulationspecificvariantsaswellas

variantsatfrequenciesbetween1-5%orlowerthan1%.Oursimulationsindicatethat

havinganaccuratediseasemodelisimportantfordeterminingthedifferencebetweenthe

geneticeffectandgeneticimpactcorrelationwhenrarevariantsareincluded.Maximum

likelihoodapproachesarewellsuitedtodifferentgeneticarchitectures.Forexample,one

couldestimateboththeglobalrelationshipbetweenallelefrequencyandeffectsizeandthe

globalrelationshipbetweenper-SNPFSTandgeneticcorrelationbyincorporating

parametersαandϒintothepriordistributionoftheeffectsizes,

.Weexpectthatincorporating

theseparameterswillimproveestimatesofheritabilityandgeneticcorrelationwhile

revealingimportantbiologicalinsights.

�1i,�2i ⇠ N✓0,

h21�

↵1i ⇢ge

ph21h

22F

�STi

⇢gep

h21h22F

�STi h

22�

↵1i

�◆




Appendix

Consider two GWAS of a phenotype conducted in di↵erent populations populations. Assume we have N1

individuals genotyped or imputed to M SNPs in study one and N2 individuals genotyped or imputed to

M SNPs in study two. Let X1, X2 and Y1, Y2 be the matrices of mean-centered genotypes and phenotypes

of the individuals in study one and two, respectively, with f1, f2 the allele frequencies of the M SNPs

common to both populations. Assuming Hardy-Weinberg equilibrium, the allele variances are �21 = 2f1(1�

f1),�22 = 2f2(1� f2). Let �1,�2 be the (unobserved) per-allele e↵ect sizes for each SNP in studies one and

two, respectively. Define the genetic impact correlation ⇢gi

= Cor(p�21�1,

p�22�2) and the genetic e↵ect

correlation ⇢ge

= Cor(�1,�2). We present a maximum likelihood framework for estimating the heritability of

the phenotype in study 1 and it’s standard error, the heritability of the phenotype in study 2 and it’s standard

error, and the genetic e↵ect and impact correlation of the phenotype between the studies and it’s standard

error given only the summary statistics Z1, Z2 and reference genotypes G1, G2 representing the populations

in the studies. We assume that genotypes are drawn randomly from populations with expected correlation

matrices ⌃1 (and similarly for study two), and that every SNP is causal with a normally distributed e↵ects

size (though this assumption is not necessary in practice, see Figure S1).

Genetic impact correlation

Let X 01 =X1p�

21

(and similarly for study 2) be normalized genotype matrices. We consider the standard linear

model for generation of the phenotypes, where

Y1 = X01�1 + ✏1

Y2 = X02�2 + ✏2

For convenience of notation let h2ix

= ⇢gi

ph21h

22. We assume the SNP e↵ects follow the infinitesimal model,

where every SNP has an e↵ect size drawn from the normal distribution, and that the residuals are independent

for each individual and normally distributed:

0

@ �1�2

1

A ⇠ N

0

@

2

4 0

0

3

5 , 1M

2

4 h21IM h2

ix

IM

h2ix

IM

h22IM

3

5

1

A (1)

0

@ ✏1✏2

1

A ⇠ N

0

@

2

4 0

0

3

5 ,

2

4 (1� h21)IM 0

0 (1� h22)IM

3

5

1

A (2)

1




where h21, h22 are the heritability of the disease in study one and two, respectively, and ⇢gi is the genetic

impact correlation.

Using the above model, we compute the distribution of the observed Z scores as a function of the reference

panel correlations and the model parameters (h21, h22, ⇢gi). Given a distribution for Z and an observation of

Z we can then choose parameters which give the highest probability of observing Z. First, we compute the

distribution of Z. It is well known that the Z-scores of a linear regression are normally distributed given

� when the sample size is large enough. Since P(Z) / P(Z|�)P(�) and the product of normal distributions

is normal, we only need to compute the unconditional mean and variance of Z to know its distribution.

Specifically, let Z = [Z>1 , Z>2 ]

>, then it’s mean is

E[Z] = E

2

4X

0>1 Y1pN1

X

0>2 Y2pN2

3

5 =

2

41pN1

(E[X 0>1 X 01]E[�1] + E[X 0>1 ]E[✏1])1pN2

(E[X 0>2 X 02]E[�2] + E[X 0>2 ]E[✏2])

3

5 = 0

The within-population variance is:

Cov[Z1i, Z1j ] = E[Z1iZ1j ] = EX,�,✏[E[Z1iZ1j |X,�, ✏]]

=1

N1EX,�,✏

[X 0>1i (X01�1 + ✏1)(X

01�1 + ✏1)

>X 01j ]

=1

N1EX,�

[X 0>1i X01�1�

>1 X

0>1 X2j ] +

1

N1EX,✏

[X 0>1i ✏1✏>1 X

01j ]

=h21

MN1EX

[X 0>1i X01X

0>1 X

01j ] +

1� h21N1

EX

[X 0>1i X01j ]

=h21

MN1(N1Mr1ij +N1

MX

m=1

r1imr1jm +N21

MX

m=1

r1imr1jm) +1� h21N1

r1ij

= r1ij +N1 + 1

Mh21⌃1(i)⌃

(j)1

where rpij

= ⌃pij

is the correlation coe�cient of SNP i and j in population p. Similarly, the between-

population variance is:

Cov[Z1i, Z2j ] =1p

N1N2EX,�

[X 0>1i X01�1�

>2 X

0>2 X

02j ] +

1pN1N2

EX,✏

[X 0>1i ✏1✏>2 X

02j ]

=h2ix

MpN1N2

EX

[X 0>1i X01X

0>2 X

02j ]

=h2ix

MpN1N2

(N1N2

MX

m=1

r1imr2jm)

=

pN1N2M

h2ix

⌃1(i)⌃(j)2

where ⌃(i) denotes the i’th row of ⌃ and ⌃(j) denotes the j’th column. The covariance of the Z-scores is

2




thus

C = Var(Z) =

2

4 ⌃1 +N1+1M

h21⌃21 h

2gx

pN1N2

M

⌃1⌃2

h2gx

pN1N2

M

⌃2⌃1 ⌃2 +N2+1M

h22⌃22

3

5 (3)

and Z ⇠ N (0, C).

Genetic e↵ect correlation

Let hex

= ⇢ge

ph21h

22. We modify the procedure above to use mean-centered instead of normalized genotype

matrices and model the distribution of the e↵ect sizes as

0

@ �1�2

1

A ⇠ N

0

B@

2

4 0

0

3

5 ,

2

64h

21

k�21k1IM

h

expk�21k1k�22k1

IM

h

expk�21k1k�22k1

IM

h

22

k�22k1IM

3

75

1

CA (4)

Notice that a linear model with e↵ects sizes acting on un-normalized genotypes is the same as a linear model

with e↵ect sizes acting on normalized genotypes under the substitution �1,2 !q

�21,2�1,2. Therefore the

covariance of Z-scores on the per allele scale can be immediately inferred from the prior derivation

C = Var(Z) =

2

64⌃1 +

N1+1k�21k1

h21⌃1�21⌃1 h

2gx

pN1N2p

k�21k1k�22k1⌃1

p�21�

22⌃2

h2gx

pN1N2p

k�21k1k�22k1⌃2

p�22�

21⌃1 ⌃2 +

N2+1k�22k1

h22⌃2�22⌃2

3

75

Approximate maximum likelihood estimation

Let C =

2

4 C11 C12C21 C22

3

5 be either of the above covariance matrices written in block form. We approximately

optimize the above likelihood as follows: first we find h21 and h22 by maximizing the likelihood corresponding

to C11 and C22, then we find ⇢gi or ⇢ge by maximizing the likelihood corresponding to C12:

l(h21|Z1,⌃,�) ⇡ �MX

i=1

w11i

✓ln(C11ii) +

Z21iC11ii

◆

l(h22|Z2,⌃,�) ⇡ �MX

i=1

w22i

✓ln(C22ii) +

Z22iC22ii

◆

l(⇢g{i,e}|Z, ĥ21, ĥ22,⌃,�) ⇡ �

MX

i=1

w12i

✓ln(C12ii) +

Z1iZ2iC12ii

◆

Because we are discarding between-SNP covariance information (Cov(Z1i, Z1j)), highly correlated SNPs will

be overcounted in our approximate likelihood. As a simple example, notice that two SNPs in perfect LD

will each contribute identical terms to the approximate likelihood, and therefore should be downweighted

by a factor of 1/2. The extent to which SNP i is over-counted is exactly the i’th entry in it’s corresponding

3




LD-matrix product. Therefore we let wgijki

= 1/ (⌃j

⌃k

)ii

and wgejki

= 1/⇣⌃

j

q�2j

�2k

⌃k

⌘

ii

to reduce the

variance in our estimates of the parameters h21, h22, ⇢gi and ⇢ge.

Furthermore, rather than compute the full products ⌃21, ⌃22 and ⌃1⌃2 over all M SNPs in the genome, we

choose a window size W and approximate the product by (⌃a

⌃b

)ii

=P

w=i+Ww=i�W raiwrbiw. Though maximum

likelihood estimation admits a straightforward estimate of the standard error via the fisher information, we

found these estimates to be inaccurate in practice. Instead, we use block jackknife with block size equal to

min(100, M200 ) SNPs to ensure that blocks are large enough to remove residual correlations.

Out of population prediction of phenotypic values

Consider using the results of a GWAS with perfect power in population 2 to predict the phenotypic values of

a set of individuals from population 1. This defines the upper limit of the correlation of true and predicted

phenotypic values. Let the true values of the e↵ects sizes in population 2 be �2. Let the true phenotypes

in population 1 be Y = X1�1 + ✏1 while the predicted phenotypes are P = X1�2. We are interested in the

correlaiton of the predicted and true phenotypes ⇢MAXY P

= Cor(Y, P ). Notice that, given X, the true and

predicted phenotype of each individual is an a�ne transformation of a multivariate normal random variable

(�) 2

4 YiPi

3

5 =

2

4 X(i) 0M0M

X(i)

3

5

2

4 �1�2

3

5+

2

4 ✏i0

3

5

and therefore (Yi

, Pi

) for individual i is multivariate normal with expected covariance matrix

EX

[Cov(Yi

, Pi

)] = EX

2

4 X(i) 0M0M

X(i)

3

5

2

641

k�21k1IM

h

expk�21k1k�22k1

IM

h

expk�21k1k�22k1

IM

h

22

k�22k1IM

3

75

2

4 X(i) 0M0M

X(i)

3

5>

= EX

2

64

Pm

X

2im

k�21k1h

ex

Pm

X

2imp

k�21k1k�22k1h

ex

Pm

X

2imp

k�21k1k�22k1h

22

Pm

X

2im

k�22k1

3

75

=

2

4 1 hexq

k�21k1k�22k1

hex

qk�21k1k�22k1

h22k�21k1k�22k1

3

5

and therefore the expected correlation E[Cor(Yi

, Pi

)] is hexph

22

qk�21k1k�22k1

k�22k1k�21k1

= ⇢ge

ph21. The expected popula-

tion correlation tends to the sample correlation as the number of samples increases, therefore

⇢MAXY P

= Cor(Y, P ) ! ⇢ge

qh21 (5)

as N ! 1

4




13

DescriptionofSupplementaldata

Supplementaldataincludesixfiguresandonetable.

Acknowledgements

TheauthorswouldliketoacknowledgeLiorPachterandHilaryFinucaneforinsightful

discussionabouttheproblem.BCBissupportedbytheNSFGRFP.ALPissupportedbyNIH

grantR01HG006399.NZissupportedbyNIHgrantK25HL121295.

WebResources

Popcornisavailableathttps://github.com/brielin/popcorn

References

1. Robinson, M.R., Hemani, G., Medina-Gomez, C., Mezzavilla, M., Esko, T., Shakhbazov, K., Powell, J.E., Vinkhuyzen, A., Berndt, S.I., Gustafsson, S., et al. (2015). Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362. 2. Burt, V.L., Whelton, P., Roccella, E.J., Brown, C., Cutler, J.A., Higgins, M., Horan, M.J., and Labarthe, D. (1995). Prevalence of hypertension in the US adult population. Results from the Third National Health and Nutrition Examination Survey, 1988-1991. Hypertension 25, 305–313. 3. Coram, M.A., Candille, S.I., Duan, Q., Chan, K.H.K., Li, Y., Kooperberg, C., Reiner, A.P., and Tang, H. (2015). Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach. Am. J. Hum. Genet. 96, 740–752. 4. Morris, A.P. (2011). Transethnic Meta-Analysis of Genomewide Association Studies. Genet. Epidemiol. 35, 809–822. 5. Bustamante, C.D., De La Vega, F.M., and Burchard, E.G. (2011). Genomics for the world. Nature 475, 163–165. 6. (2011). A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat Genet 43, 339–344. 7. Okada, Y., Wu, D., Trynka, G., Raj, T., Terao, C., Ikari, K., Kochi, Y., Ohmura, K., Suzuki, A., Yoshida, S., et al. (2014). Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381. 8. de Candia, T.R., Lee, S.H., Yang, J., Browning, B.L., Gejman, P.V., Levinson, D.F., Mowry, B.J., Hewitt, J.K., Goddard, M.E., O’Donovan, M.C., et al. (2013). Additive Genetic Variation in Schizophrenia Risk Is Shared by Populations of African and European Descent. Am. J. Hum. Genet. 93, 463–470. 9. Lee, S., Teslovich, T.M., Boehnke, M., and Lin, X. (2013). General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies. Am. J. Hum. Genet. 93, 42–53. 10. Palla, L., and Dudbridge, F. (2015). A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. Am. J. Hum. Genet. 97, 250–259. 11. Yang, J., Ferreira, T., Morris, A.P., Medland, S.E., Madden, P.A.F., Heath, A.C., Martin, N.G., Montgomery, G.W., Weedon, M.N., Loos, R.J., et al. (2012). Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375. 12. Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., Hirschhorn, J., Strachan, D.P., Patterson, N., and Price, A.L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914.




14

13. Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B., and Eskin, E. (2014). Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics 198, 497–508. 14. Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B., and Eskin, E. (2015). Identification of causal genes for complex traits. Bioinformatics 31, i206–i213. 15. Kichaev, G., Yang, W.-Y., Lindstrom, S., Hormozdiari, F., Eskin, E., Price, A.L., Kraft, P., and Pasaniuc, B. (2014). Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies. PLoS Genet. 10, e1004722. 16. Bulik-Sullivan, B.K., Loh, P.-R., Finucane, H.K., Ripke, S., Yang, J., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson, N., Daly, M.J., Price, A.L., and Neale, B.M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295. 17. Finucane, H.K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., Anttila, V., Xu, H., Zang, C., Farh, K., et al. (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235. 18. Bulik-Sullivan, B., Finucane, H.K., Anttila, V., Gusev, A., Day, F.R., Loh, P.-R., ReproGen Consortium, Psychiatric Genomics Consortium, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3, Duncan, L., et al. (2015). An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–1241. 19. Park, D.S., Brown, B., Eng, C., Huntsman, S., Hu, D., Torgerson, D.G., Burchard, E.G., and Zaitlen, N. (2015). Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 31, i181–i189. 20. Xu, Z., Duan, Q., Yan, S., Chen, W., Li, M., Lange, E., and Li, Y. (2015). DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31, 2434–2442. 21. International Schizophrenia Consortium (2009). Common polygenic variation contributes to risk of schizophrenia that overlaps with bipolar disorder. Nature 460, 748–752. 22. Zuo, L., Zhang, C.K., Wang, F., Li, C.-S.R., Zhao, H., Lu, L., Zhang, X.-Y., Lu, L., Zhang, H., Zhang, F., et al. (2011). A Novel, Functional and Replicable Risk Gene Region for Alcohol Dependence Identified by Genome-Wide Association Study. PLoS ONE 6, e26726. 23. Fesinmeyer, M., North, K., Ritchie, M., Lim, U., Franceschini, N., Wilkens, L., Gross, M., Bůžková, P., Glenn, K., Quibrera, P., et al. (2013). Genetic risk factors for body mass index and obesity in an ethnically diverse population: results from the Population Architecture using Genomics and Epidemiology (PAGE) Study. Obes. Silver Spring Md 21, 10.1002/oby.20268. 24. Chang, M., Ned, R.M., Hong, Y., Yesupriya, A., Yang, Q., Liu, T., Janssens, A.C.J.W., and Dowling, N.F. (2011). Racial/Ethnic Variation in the Association of Lipid-Related Genetic Variants With Blood Lipids in the US Adult Population. Circ. Cardiovasc. Genet. 4, 523–533. 25. Waters, K.M., Stram, D.O., Hassanein, M.T., Le Marchand, L., Wilkens, L.R., Maskarinec, G., Monroe, K.R., Kolonel, L.N., Altshuler, D., Henderson, B.E., et al. (2010). Consistent Association of Type 2 Diabetes Risk Variants Found in Europeans in Diverse Racial and Ethnic Groups. PLoS Genet. 6, e1001078. 26. Lee, S.H., Yang, J., Goddard, M.E., Visscher, P.M., and Wray, N.R. (2012). Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542. 27. Falconer, D.S., and Mackay, T.F.C. (1996). Introduction to quantitative genetics (Essex, England: Longman). 28. Loh, P.-R., Tucker, G., Bulik-Sullivan, B.K., Vilhjálmsson, B.J., Finucane, H.K., Salem, R.M., Chasman, D.I., Ridker, P.M., Neale, B.M., Berger, B., et al. (2015). Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290. 29. Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W., et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569. 30. So, H.-C., Li, M., and Sham, P.C. (2011). Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet. Epidemiol. n/a – n/a. 31. Vattikuti, S., Guo, J., and Chow, C.C. (2012). Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits. PLoS Genet. 8, e1002637.




15

32. ’t Hoen, P.A.C., Friedländer, M.R., Almlöf, J., Sammeth, M., Pulyakhina, I., Anvar, S.Y., Laros, J.F.J., Buermans, H.P.J., Karlberg, O., Brännvall, M., et al. (2013). Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022. 33. Morris, A.P., Voight, B.F., Teslovich, T.M., Ferreira, T., Segrè, A.V., Steinthorsdottir, V., Strawbridge, R.J., Khan, H., Grallert, H., Mahajan, A., et al. (2012). Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990. 34. Cho, Y.S., Chen, C.-H., Hu, C., Long, J., Hee Ong, R.T., Sim, X., Takeuchi, F., Wu, Y., Go, M.J., Yamauchi, T., et al. (2012). Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat Genet 44, 67–72. 35. Su, Z., Marchini, J., and Donnelly, P. (2011). HAPGEN2: simulation of multiple disease SNPs. Bioinformatics 27, 2304–2305. 36. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M., and Lee, J.J. (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4,. 37. Lee, S.H., Wray, N.R., Goddard, M.E., and Visscher, P.M. (2011). Estimating Missing Heritability for Disease from Genome-wide Association Studies. Am. J. Hum. Genet. 88, 294–305. 38. Speed, D., Hemani, G., Johnson, M.R., and Balding, D.J. (2012). Improved Heritability Estimation from Genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021. 39. Yang, J., Bakshi, A., Zhu, Z., Hemani, G., Vinkhuyzen, A.A.E., Lee, S.H., Robinson, M.R., Perry, J.R.B., Nolte, I.M., van Vliet-Ostaptchouk, J.V., et al. (2015). Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120. 40. Price, A.L., Helgason, A., Thorleifsson, G., McCarroll, S.A., Kong, A., and Stefansson, K. (2011). Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals. PLoS Genet. 7, e1001317. 41. Gaffney, D.J. (2013). Global Properties and Functional Complexity of Human Gene Regulatory Variation. PLoS Genet. 9, e1003501. 42. Stahl, E.A., Wegmann, D., Trynka, G., Gutierrez-Achury, J., Do, R., Voight, B.F., Kraft, P., Chen, R., Kallberg, H.J., Kurreeman, F.A.S., et al. (2012). Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489. 43. Marenberg, M.E., Risch, N., Berkman, L.F., Floderus, B., and de Faire, U. (1994). Genetic susceptibility to death from coronary heart disease in a study of twins. N. Engl. J. Med. 330, 1041–1046. 44. Nora, J.J., Lortscher, R.H., Spangler, R.D., Nora, A.H., and Kimberling, W.J. (1980). Genetic--epidemiologic study of early-onset ischemic heart disease. Circulation 61, 503–508. 45. Chen, X., Kuja-Halkola, R., Rahman, I., Arpegård, J., Viktorin, A., Karlsson, R., Hägg, S., Svensson, P., Pedersen, N.L., and Magnusson, P.K.E. (2015). Dominant Genetic Variation and Missing Heritability for Human Complex Traits: Insights from Twin versus Genome-wide Common SNP Models. Am. J. Hum. Genet. 97, 708–714. 46. Zhu, Z., Bakshi, A., Vinkhuyzen, A.A.E., Hemani, G., Lee, S.H., Nolte, I.M., van Vliet-Ostaptchouk, J.V., Snieder, H., Esko, T., Milani, L., et al. (2015). Dominance Genetic Variation Contributes Little to the Missing Heritability for Human Complex Traits. Am. J. Hum. Genet. 96, 377–385. 47. Marchini, J., and Howie, B. (2010). Genotype imputation for genome-wide association studies. Nat Rev Genet 11, 499–511. 48. Ma, R.C.W., and Chan, J.C.N. (2013). Type 2 diabetes in East Asians: similarities and differences with populations in Europe and the United States: Diabetes in East Asians. Ann. N. Y. Acad. Sci. 1281, 64–91.

FigureTitlesandLegends

Figure1:Trueandestimatedgeneticimpactandeffectcorrelation.Allsimulations

conductedwithsimulatedEURandEASheritabilityof0.5using4499simulatedEURand

4836simulatedEASindividualsat248,953SNPs.

Figure2:DistributionofgeneticcorrelationcomparisonbetweenpopcornandGCTA.

Distributionwascomputedusingagaussiankdeonthesetofgeneticcorrelationestimates.




16

Figure3:Geneticcorrelationasafunctionofheritabilityforgeneexpression.Themeanand

standarderrorofthegeneticcorrelationofthesetofgeneswithh12andh22exceeding

thresholdhineachanalysis(y-axis)isplottedagainsth(x-axis).

Tables

MAF>0.01 MAF>0.05

Model Independent 0.500 0.478 0.500 0.460 0.500 0.488 0.509 0.469

Inverse 0.431 0.500 0.567 0.496 0.479 0.500 0.555 0.482

Rare 0.500 0.467 0.382 0.863 0.500 0.496 0.998 0.756

Common 0.500 0.500 0.522 0.493 0.500 0.500 0.502 0.496

Difference 0.500 0.416 0.354 0.435 0.500 0.461 0.410 0.412

Adversarial 0.710 0.604 0.525 0.651 0.714 0.667 0.601 0.675

Table1:Trueandestimatedvaluesofthegeneticimpactandeffectcorrelationinsimulated

EUR-likeandEAS-likegenotypes.Resultsaretheaverageof100simulationswith

phenotypeheritabilityof0.5ineachpopulation.

Table2:HeritabilityandgeneticcorrelationofRAandT2DbetweenEURandEAS

populations.EURRAdatacontained8,875casesand29,367controlsforastudyprevalence

of0.23.EASRAdatacontained4,873casesand17,642controlsforastudyprevalenceof

0.22.RAdiseaseprevalencewasassumedtobe0.5%inbothpopulations7.T2DEURdata

contained12171casesand56862controlsforastudyprevalenceof0.18.T2DEASdata

⇢ge ⇢gi ⇢̂ge ⇢̂gi ⇢ge ⇢gi ⇢̂ge ⇢̂gi

hEUR2liability hEAS2liability ρge ρgi

RA Est.(SE) 0.18(0.02) 0.22(0.03) 0.46(0.06) 0.46(0.06)

95%CI [0.15,0.21] [0.16,0.28] [0.34, 0.58] [0.34, 0.58]

p>0 3.90e-32 1.89e-17 1.37e-15 8.16e-16

p0 2.41e-77 5.73e-7 1.70e-12 2.85e-13

p

17

contained6952casesand11865controlsforastudyprevalenceof0.37.T2DEUR

prevalencewasassumedtobe8%33whileT2DEASprevalencewasassumedtobe9%48.




Transethnic genetic correlation estimates from summary ... · 2/23/2016 · Consortium, Chun...

Documents

Transcript of Transethnic genetic correlation estimates from summary ... · 2/23/2016 · Consortium, Chun...