It is made available under a CC-BY-NC-ND 4.0 International ... · Consortium, Chun Jimmie Ye, Alkes...
Transcript of It is made available under a CC-BY-NC-ND 4.0 International ... · Consortium, Chun Jimmie Ye, Alkes...
1
Transethnicgeneticcorrelationestimatesfromsummarystatisticssupportwidespread
non-additiveeffects
BrielinC.Brown,AsianGeneticEpidemiologyNetwork-Type2Diabetes(AGEN-T2D)Consortium,ChunJimmieYe,AlkesL.Price,NoahZaitlen
Abstract
Theincreasingnumberofgeneticassociationstudiesconductedinmultiple
populationsprovidesunprecedentedopportunitytostudyhowthegeneticarchitectureof
complexphenotypesvariesbetweenpopulations,aproblemimportantforbothmedicaland
populationgenetics.Herewedevelopamethodforestimatingthetransethnicgenetic
correlation;thecorrelationofcausalvarianteffectsizesatSNPscommoninpopulations.
Unlikesomepriorapproaches,wetakeadvantageoftheentirespectrumofSNP
associationsandutilizeonlysummary-levelGWASdata,therebyavoidingthe
computationalcostsandprivacyconcernsassociatedwithgenotype-levelinformationwhile
remainingscalabletohundredsofthousandsofindividualsandmillionsofSNPs.Weapply
ourmethodtogeneexpression,rheumatoidarthritis,andtype-twodiabetesdataand
overwhelminglyfindthatthegeneticcorrelationissignificantlylessthan1.Wearguethat
thisisevidenceforthepresenceofnon-additiveordifferentialtaggingeffectsthatmodify
themarginaleffectsizesatSNPscommoninbothpopulations.Ourmethodisimplemented
inapythonpackagecalledpopcorn.
Introduction
Manycomplexhumanphenotypesvarydramaticallyintheirdistributionsbetween
populationsduetoacombinationofgeneticandenvironmentaldifferences.Forexample,
northernEuropeansareonaveragetallerthansouthernEuropeans1andAfricanAmericans
haveanincreasedrateofhypertensionrelativetoEuropeanAmericans2.Thegenetic
contributiontopopulationphenotypicdifferentiationisdrivenbydifferencesincausal
allelefrequencies,effectsizes,andgeneticarchitectures.Understandingtherootcausesof
phenotypicdifferencesworldwidehasprofoundimplicationsforbiomedicalandclinical
practiceindiversepopulations,thetransferabilityofepidemiologicalresults,aidingmulti-
ethnicdiseasemapping3,4,assessingthecontributionofnon-additiveandrarevariant
effects,andmodelingthegeneticarchitectureofcomplextraits.Inthisworkweconsidera
centralquestionintheglobalstudyofphenotype:dogeneticvariantshavethesame
phenotypiceffectsindifferentpopulations?
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
2
WhilethevastmajorityofGWAShavebeenconductedinEuropeanpopulations5,
thegrowingnumberofnon-Europeanandmulti-ethnicstudies4,6,7provideanopportunity
tostudygeneticeffectdistributionsacrosspopulations.Forexample,onerecentstudyused
mixed-modelbasedmethodstoshowthatthegenome-widegeneticcorrelationof
schizophreniabetweenEuropeanandAfricanAmericansisnonzero8.Whilepowerful,
computationalcostsandprivacyconcernslimittheutilityofgenotype-basedmethods.In
thiswork,wemaketwosignificantcontributionstostudiesoftransethnicgenetic
correlation.First,weexpandthedefinitionofgeneticcorrelationtobetteraccountfora
transethniccontext.Second,wedevelopanapproachtoestimatinggeneticcorrelation
acrosspopulationsthatusesonlysummarylevelGWASdata.Similartootherrecent
summarystatisticsbasedheritabilitymethods9–11,ourapproachavoidsprivacyconcerns
andisscalabletohundredsofthousandsofindividualsandmillionsofmarkers.Unlike
traditionalapproachesthatfocusonthesimilarityofGWASresults12–16weutilizetheentire
spectrumofGWASassociationswhileaccountingforLDinordertoavoidfiltering
correlatedSNPs.
Inasinglepopulation,thegeneticcorrelationoftwophenotypesisdefinedasthe
correlationcoefficientofSNPeffectsizes17,18.Inmultiplepopulations,differencesinallele
frequencymotivatemultiplepossibledefinitionsofgeneticcorrelation.Hereweconsider
boththecorrelationofalleleeffectsizesaswellasthecorrelationofallelicimpact,which
takesintoaccountthefrequencyofthevariantinthepopulation.Forexample,avariant
mayhaveamuchhighereffectsizebutmuchlowerfrequencyinonepopulation.Therefore,
wedefinethetransethnicgeneticeffectcorrelation(ρge)asthecorrelationcoefficientofthe
per-alleleSNPeffectsizes,andthetransethnicgeneticimpactcorrelation(ρgi)asthe
correlationcoefficientofthepopulation-specificallelevariancenormalizedSNPeffectsizes.
Whileotherdefinitionsofthegeneticcorrelationarepossible(seediscussion),these
quantitiescapturetwoimportantquestionsaboutthestudyofdiseaseinmultiple
populations:towhatextentdothesamemutationsinmultiplepopulationsdifferintheir
phenotypiceffectsandtowhatextentarethesedifferencesmitigatedorexacerbatedby
differencesinallelefrequency?
Toestimategeneticcorrelation,wetakeaBayesianapproachwhereinweassume
genotypesaredrawnrandomlyfromtheirpopulationandeffectssizeshaveanormalprior
(theinfinitesimalmodel19).Whileunlikelytorepresentreality,thismodelhasbeenused
successfullyinpractice8,20–23.Theinfinitesimalassumptionyieldsamultivariatenormal
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
3
distributionontheobservedteststatistics(Z-scores),whichisafunctionoftheheritability
andgeneticcorrelation.RatherthanpruningSNPsinLD10,24,25,thisallowsustoexplicitly
modeltheresultinginflationofZ-scores.Wethenmaximizeanapproximateweighted
likelihoodfunctiontofindtheheritabilityandgeneticcorrelation.Thismethodis
implementedinapythonpackagecalledpopcorn.Thoughderivedforquantitative
phenotypes,popcornextendseasilytobinaryphenotypesundertheliabilitythreshold
model.Weshowviaextensivesimulationthatpopcornproducesunbiasedestimatesofthe
geneticcorrelationandthepopulationspecificheritabilities,withastandarderrorthat
decreasesasthenumberofSNPsandindividualsinthestudiesincreases.Furthermore,we
showthatourapproachisrobusttoviolationsoftheinfinitesimalassumption.
WeapplypopcorntoEuropeanandYorubangeneexpressiondata26aswellasGWAS
summarystatisticsfromEuropeanandEastAsianrheumatoidarthritisandtype-two
diabetescohorts,27,28.OuranalysisofgEUVADISshowsthatoursummarystatisticbased
estimatorisconcordantwiththemixedmodelbasedestimator.Wefindthatthemean
transethnicgeneticcorrelationacrossallgenesislow(ρge=0.320(0.009)),butincreases
substantiallywhenthegeneishighlyheritableinbothpopulations(ρge=0.772(0.017)).We
findthegeneticeffectcorrelationinRAandT2Dtobe0.463(0.058)and0.621(0.088),
respectively.
Acrossallphenotypesconsidered,weoverwhelminglyfindthatthetransethnic
geneticcorrelationissignificantlylessthanone.Therearemanyphenomenathat
contributetothis,including:untypedandunimputedrarevariantscorrelatedwith
observedSNPs;gene-geneinteractionsordominanceeffects,gene-environment
interactions,includingepigeneticeffects,thataredifferentialbetweenpopulations;and
differencesinsub-phenotypecomposition.Ourresultsthereforeshowthatthese
phenomenasignificantlyaltertheeffectsizesofSNPscommontobothpopulations.
Methods
ConsiderGWASofaphenotypeconductedintwodifferentpopulations.Assumewe
haveN1individualsgenotypedonMSNPsinstudyoneandN2individualsgenotypedonthe
sameSNPsstudytwo.LetX1,X2bethematricesofmean-centeredgenotypesinstudyone
andstudytwo,respectively,andletY1,Y2benormalizedphenotypesoftheindividualsin
studyoneandtwo,respectively.Letf1,f2bevectorsoftheallelefrequenciesoftheMSNPs
commontobothpopulations.AssumingHardy-Weinbergequilibrium,theallelevariances
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
4
areσ12=2f1(1–f1),σ22=2f2(1–f2).Letβ1,β2bethe(unobserved)per-alleleeffectsizesfor
eachSNPinstudiesoneandtwo,respectively.Theheritabilityinstudyoneisthenh12=Σi
σ1i2β1i2(andlikewiseforstudytwo).Theobjectiveofthisworkistoestimatetransethnic
geneticcorrelationfromsummarystatisticsofcommonvariants (and
likewiseforstudytwo)andestimatesofpopulationLDmatrices(Σ1andΣ2)fromexternal
referencepanels.
Therearemultiplepossibledefinitionsoftransethnicgeneticcorrelationandinthis
workweconsiderthegeneticeffectcorrelationρge=Cor(β1,β2),previouslydefinedbyLeeet
al17andimplementedinGCTA,whileadditionallydefiningthegeneticimpactcorrelationρgi
=Cor(σ1β1,σ2β2).Intuitively,thegeneticeffectcorrelationmeasurestheextenttowhichthe
samevarianthasthesamephenotypicchange,whilethegeneticimpactcorrelationgives
moreweighttocommonallelesthanrareonesseparatelyineachpopulation.Forexample,
iftheeffectsizesarethesameineachpopulationρge=1butρgi<1becauseofallele
frequencydifferencesbetweenthepopulations.Inthiscaseρgi<ρgehowevertheopposite
canalsobetrue.Ifrarealleleshavelargereffectsizesandtherearemanyallelescommonin
studyonebutrareinstudytwo,thenρgiwillbegreaterthanρge.Inthiscase,thedifferences
ineffectsizesaremitigatedbycorrespondingdifferencesinallelefrequency.
Weassumethegenotypesaredrawnrandomlyfromeachpopulationandthat
phenotypesaregeneratedbythelinearmodelY1=X1β1+ε1(likewiseforphenotypetwo).
Wheneffectsizesβareassumedinverselyproportionaltoallelefrequency,asiscommonly
done21,23,weshow(Appendix)thatunderthelinearinfinitesimalgeneticarchitecture,the
jointdistributionoftheZ-scoresfromeachstudyisasymptoticallymultivariatenormalwith
mean andvariance:
(1)
However,wheneffectssizesareassumedindependentofallelefrequencyweshow:
(2)
Z1 =(X1/�1)>Y1p
N1
�!0
Var(Z) =
"⌃1 +
N1+1M h2
1⌃21 ⇢gi
ph21h
22
pN1N2
M ⌃1⌃2
⇢gip
h21h
22
pN1N2
M ⌃2⌃1 ⌃2 +N2+1M h2
2⌃22
#
Var(Z) =
2
4⌃1 +
N1+1k�2
1k1h21⌃1�2
1⌃1 ⇢geph21h
22
pN1N2p
k�21k1k�2
2k1
⌃1
p�21�
22⌃2
⇢gep
h21h
22
pN1N2p
k�21k1k�2
2k1
⌃2
p�22�
21⌃1 ⌃2 +
N2+1k�2
2k1h22⌃2�2
2⌃2
3
5
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
5
Giventheseequationsforvariance,thequantitiesρgiorρgeandh12,h22canbe
estimatedbymaximizingthemultivariatenormallikelihood
,whereCiseitheroftheabovecovariance
matrices(1)and(2).BecauseΣ1andΣ2areestimatedfromfiniteexternalreferencepanels,
maximumlikelihoodestimationoftheabovemultivariatenormalleadstooverfitting.We
implementedtwooptimizationstoavoidthisproblem.First,wemaximizeanapproximate
weightedlikelihoodthatutilizesonlythediagonalelementsofeachblockofVar(Z).This
allowsustoaccountfortheLD-inducedinflationoftestsstatisticsbutdiscardscovariance
informationbetweenpairsofZ-scoresandthereforeleadstoovercountingZ-scoresofSNPs
inhighLD.Tocompensateforthis,wedownweightZ-scoresofSNPsinproportiontheirLD.
Second,ratherthancomputethefullproductsΣ12,Σ22andΣ1Σ2overallMSNPsinthe
genome,wechooseawindowsizeWandapproximatetheproductby
.TheseoptimizationsaresimilartothoseemployedbyLDscore
regression21.Thefulldetailsofthederivationandoptimizationareprovidedinthe
appendix.
Results
SimulatedGenotypesandSimulatedPhenotypes
Inordertoverifythatpopcornyieldsanunbiasedestimateoftheheritabilities(h12,
h22)andgeneticcorrelations(ρge,ρgi),weappliedpopcorntosummarystatisticsfrom
simulatedGWAS.Wesimulated50,000European-like(EUR)and50,000EastAsian-like
(EAS)individualsat248,953SNPsfromchromosomes1-3withallelefrequencyabove1%
inbothEuropeanandEastAsianHapMap3populationswithHapGen229.HapGen2
implementsahaplotyperecombinationwithmutationmodelthatresultsinexcesslocal
relatednessamongthesimulatedindividuals.Toaccountforthislocalstructure,we
randomlyremovedoneindividualfromeachpairwithglobalIBDcoefficientπ>0.15,
leaving20,085simulatedEURand20,006simulatedEAS.Additionally,weuseda5MB
windowinouranalysistoaccommodatelongerrangeLD.Fromthesesimulatedindividuals,
1000perpopulationwerechosenuniformlyatrandomtoserveasanexternalreference
panelforestimatingΣ1andΣ2.
l(⇢g{i,e}, h21, h
22|Z,⌃,�) / �ln(|C|)� Z>C�1Z
(⌃a⌃b)ii =w=i+WX
w=i�W
raiwrbiw
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
6
Ineachsimulationeffectsizesweredrawnfroma“spikeandslab”model,where
withprobabilitypand with
probability1-p.ρgiwasanalyticallycomputedfromthesimulatedeffectsizesandallele
frequenciesinthesimulatedreferencegenotypes.Quantitativephenotypesweregenerated
underalinearmodelwithi.i.d.noiseandnormalizedtohavemean0andvariance1,while
binaryphenotypesweregeneratedunderaliabilitythresholdmodelwhereindividualsare
labeledcaseswhentheirliabilityexceedsathreshold ,withKthe
populationdiseaseprevalence30.
Wevariedh12,h22,ρge,andρgi,aswellasthenumberofindividualsineachstudy(N1,
N2),thenumberofSNPs(M),thepopulationprevalenceK,andproportionofcausalvariants
(p)inthesimulatedGWASandgeneratedsummarystatisticsforeachstudy.Theresults
showninFigure1andFigureS1demonstratethattheestimatorsarenearlyunbiasedand
thatstandarderrorthatdecreaseswith(roughly)thesquarerootofthenumberofSNPs
andindividuals.Furthermore,byvaryingtheproportionofcausalvariantspweshowthat
ourestimatorisrobusttoviolationsoftheinfinitesimalassumption(FigureS2).Finally,we
showinTableS1thatourestimatesoftheheritabilityofliabilityincasecontrolstudiesare
nearlyunbiased.
Simulationswithnonstandarddiseasemodels
Ourapproach,aswellasgenotype-basedmethodssuchasGCTAmakeassumptions
aboutthegeneticarchitectureofcomplextraits.Previousworkhasshownthatviolationsof
theseassumptionscanleadtobiasinheritabilityestimation31,thereforewesoughtto
quantifytheextentthatthisbiasmayeffectourestimates.Wesimulatedphenotypesunder
sixdifferentdiseasemodels.Independent:effectsizeindependentofallelefrequency.
Inverse:effectsizeinverselyproportionaltoallelefrequency.Rare:onlySNPswithallele
frequencyunder10%affectthetrait.Common:onlySNPswithallelefrequencybetween
40%and50%affectthetrait.Difference:effectsizeproportionaltodifferenceinallele
frequency.Adversarial:differencemodelwithsignofbetasettoincreasethephenotypein
thepopulationwherethealleleismostcommon.Additionalgeneticarchitecturesare
possible,includingoneswhereeffectsizesarenotadirectfunctionofMAF32.
Wesimulatedphenotypesusinggenotypeswithallelefrequencyabove1%or5%
andcomparedthetrueandestimatedgeneticimpactandeffectcorrelationamongall
�1i,�2i ⇠ N✓0,
h21 ⇢ge
ph21h
222
⇢gep
h21h
22 h2
2
�◆
�1i,�2i = (0, 0)
⌧ = ��1(1�K)
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
7
models(Table1).WefindthatwhenonlySNPswithfrequencyabove5%inboth
populationsareused,thedifferenceinρgeandρgiisminimalexceptinthemostadversarial
cases.Evenintheadversarialmodel,thetruedifferenceisonly7%.Thoughunlikelyto
representreality,thefournonstandarddiseasemodelsresultinsubstantialbiasinour
estimators.WhenSNPswithallelefrequencyabove1%inbothpopulationsareincluded,
thedifferencesaremorepronounced.Thisisbecausethenormalizingconstant1/σrapidly
increasesastheSNPbecomesmorerare.Indeed,asSNPsbecomemorerarehavingan
accuratediseasemodelbecomesincreasinglyimportantandthereforeweproceedwitha
5%MAFcutoffinouranalysisofrealdataandusethenotationhc2torefertotheheritability
ofSNPswithallelefrequencyabove5%inbothpopulations(thecommon-SNPheritability).
Note,however,thatoneoftheadvantagesofmaximumlikelihoodestimationingeneralis
thatthelikelihoodcanbereformulatedtomimicthediseasemodelofinterest.
ValidationofPopcornusinggeneexpressioninGEUVADIS
Tofurthervalidateourapproach,wecomparedthecommon-SNPheritability(hc2)
andgeneticcorrelationestimatesofpopcorntoGCTAinthegEUVADISdatasetforwhich
rawgenotypesarepubliclyavailable.gEUVADISconsistsofRNA-seqdatafor464
lymphoblastoidcellline(LCL)samplesfromfivepopulationsinthe1000genomesproject.
Ofthese,375areofEuropeanancestry(CEU,FIN,GBR,TSI)and89areofAfricanancestry
(YRI).RawRNA-sequencingreadsobtainedfromtheEuropeanNucleotideArchivewere
alignedtothetranscriptomeusingUCSCannotationsmatchinghg19coordinates.RSEMwas
usedtoestimatetheabundancesofeachannotatedisoformandtotalgeneabundanceis
calculatedasthesumofallisoformabundancesnormalizedtoonemilliontotalcountsor
transcriptspermillion(TPM).ForeQTLmapping,CaucasianandYorubansampleswere
analyzedseparately.Foreachpopulation,TPMsweremediannormalizedtoaccountfor
differencesinsequencingdepthineachsampleandstandardizedtomean0andvariance1.
Ofthe29763totalgenes,9350withTPM>2inbothpopulationswerechosenforthis
analysis.
Foreachgeneweconductedacis-eQTLassociationstudyatallSNPswithin1
megabaseofthegenebodywithallelefrequencyabove5%inbothpopulationsusing30
principalcomponentsascovariates.WefoundthatGCTAandpopcornagreeontheglobal
distributionofheritability(FigureS3)andthatGCTA’sestimatesofgeneticcorrelationhave
asimilardistributiontopopcorn’sgeneticeffect(GE)andgeneticimpact(GI)correlation
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
8
estimates(Figure2).WhilethenumberofSNPsandindividualsincludedineachgene
analysisaretoosmalltoobtainaccuratepointestimatesofthegeneticcorrelationonaper-
genebasis(N=464,M=4279.5),thelargenumberofgenesenablesaccurateestimationof
theglobalmeanheritabilityandgeneticcorrelation.
Common-SNPheritabilityandgeneticcorrelationofgeneexpressioningEUVADIS
Wefindthattheaveragecis-hc2oftheexpressionofthegenesweanalyzedwas
0.093(0.002)inEURand0.088(0.002)inYRI.Ourestimatesarehigherthanpreviously
reportedaveragecis-heritabilityestimatesof0.055inwholebloodand0.057inadipose33,
whichcouldariseforseveralreasons.First,weremove68%ofthetranscriptsthatare
lowlyexpressedineithertheYRIorEURdata.Second,estimatesfromRNA-seqanalysisof
celllinesmightnotbedirectlycomparabletomicroarraydatafromtissue.
Theaveragegeneticeffectcorrelationwas0.320(0.010)whiletheaveragegenetic
impactcorrelationwas0.313(0.010).Notably,thegeneticcorrelationincreasesasthecis-
hc2ofexpressioninbothpopulationsincreases(Figure3).Inparticular,whenthecis-hc2of
thegeneisatleast0.2inbothpopulationsthegeneticeffectcorrelationwas0.772(0.017)
whilethegeneticimpactcorrelationwas0.753(0.018).
Inordertoverifythattherewerenosmall-samplesizeorconditioningbiasesinour
analysis,weanalyzedthegeneticcorrelationofsimulatedphenotypesoverthegEUVADIS
genotypes.Wesampledpairsofheritabilitiesfromtheestimatedexpressionheritability
distributionandsimulatedpairsofphenotypestohavethegivenheritabilityandagenetic
effectcorrelationof0.0overrandomlychosen4000baseregionsfromchromosome1ofthe
gEUVADISgenotypes.Withoutconditioning,theaverageestimatedgeneticeffect
correlationwas-0.002(0.003),indicatingthattheestimatorremainedunbiased.
Furthermore,theaverageestimatedgeneticeffectcorrelationwasnotsignificantlydifferent
from0.0conditionalontheestimatesofheritabilitybeingaboveacertainthreshold(Figure
S4).Notehoweverthatthesmallsamplesizeincreasesthevarianceoftheestimator
substantially.Becausethegeneticcorrelationisboundedbetween-1and1,thismayinduce
biaswhenthetruevalueisclosetotheboundaryandthesamplesizeissmall.
Wefindthatwhiletheaveragegeneticcorrelationislow,thegeneticcorrelation
increaseswiththecis-hc2ofthegene,indicatingthatascis-geneticregulationofgene
expressionincreasesitdoessosimilarlyinbothYRIandEURpopulations.Thismayhelp
interprettherecentobservationthatwhiletheglobalgeneticcorrelationofgeneexpression
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
9
acrosstissuesislow33,cis-eQTL’stendtoreplicateacrosstissues34.Asthepresenceofacis-
eQTLindicatessubstantialcis-geneticregulation,ananalysisofeQTLreplicationacross
tissuesisimplicitlyconditioningontheheritabilityofgeneexpressionbeinghighand
thereforemayindicatemuchhighergeneticcorrelationthanaverage.
SummarystatisticsofRAandT2D
Finally,wesoughttoexaminethetransethnicρgiandρgeinRAandT2Dcohortsfor
whichrawgenotypesarenotavailable.WeobtainedsummarystatisticsofGWASfor
rheumatoidarthritisandtype-2diabetesconductedinEuropeanandEastAsian
populations.WeremovedtheMHCregion(chromosome6,25–35Mb)fromtheRA
summarystatistics.Weestimatedthecommon-SNPheritabilityandgeneticcorrelation
using2539629SNPsgenotypedorimputedinbothRAstudiesand1054079SNPs
genotypedorimputedinbothT2Dstudieswithallelefrequencyabove5%in1000genomes
EURandEASpopulations.Thehc2andgeneticcorrelationestimatesarepresentedinTable
2.OurRAhc2estimatesof0.177(0.015)and0.221(0.026)forEURandEAS,respectively,
arelowerthanapreviouslyreportedmixed-modelbasedheritabilityestimatesof0.32
(0.037)inEuropeans35.Similarly,ourT2Dhc2estimatesof0.242(0.013)and0.105(0.021)
forEURandEAS,respectively,arelowerthanapreviouslyreportedmixed-modelbased
estimateof0.51(0.065)inEuropeans35.Westressthatthisdiscrepancyislikelyduetothe
differencebetweencommon-SNPheritabilityhc2andtotalnarrow-senseheritabilityhg2.
Furthermore,estimatesoftheheritabilityofT2Dfromfamilystudiescanvary
significantly36,37.
WefindthegeneticeffectcorrelationinRAandT2Dtobe0.463(0.058)and0.621
(0.088),respectively,whilethegeneticimpactcorrelationisnotsignificantlydifferentat
0.455(0.056)and0.606(0.083).Thetransethnicgeneticimpactandeffectcorrelationfor
bothphenotypesaresignificantlydifferentfromboth1and0(Table2),showingthatwhile
thereiscleargeneticoverlapbetweenthephenotypes,theperalleleeffectssizesdiffer
significantlybetweenthetwopopulations.Thisisevidenceforthepresenceofdifferential
taggingofuntypedandunimputed,possiblyrarevariantsaswellasgene-gene,gene-
environmentorothernon-additiveinteractions,themarginaleffectssizesofwhichare
modifiedbychangesinallelefrequencybetweenthetwopopulations.Whilewithin-locus
(dominance)interactionsmayalsoplayarole38,themagnitudeofthiseffecthasbeen
debated39.
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
10
Discussion
Wehavedevelopedthetransethnicgeneticeffectandgeneticimpactcorrelationand
providedanestimatorforthesequantitiesbasedonlyonsummary-levelGWASinformation
andsuitablereferencepanels.Wehaveappliedourestimatortoseveralphenotypes:
rheumatoidarthritis,type-2diabetesandgeneexpression.WhilethegEUVADISdataset
lacksthepowerrequiredtomakeinferencesaboutthegeneticcorrelationofsingleorsmall
subsetsofgenes,wecanmakeinferencesabouttheglobalstructureofgeneticcorrelationof
geneexpression.Wefindthattheglobalmeangeneticcorrelationislow,butthatit
increasessubstantiallywhentheheritabilityishighinbothpopulations.Inallphenotypes
analyzed,thegeneticcorrelationissignificantlydifferentfromboth0and1.Ourresults
showthatglobaldifferencesinSNPeffectsizeofcomplextraitscanbelarge.Incontrast,
effectssizesofgeneexpressionappeartobemoreconservedwherethereisstronggenetic
regulation.
Itisnotpossibletodrawconclusionsaboutpolygenicselectionfromestimatesof
transethnicgeneticcorrelation.Theeffectssizesmaybeidentical(!!" = 1)whilepolygenicselectionactstochangeonlytheallelefrequencies.Similarly,theeffectssizesmaybe
different(!!" < 1)withoutselection.DifferencesineffectssizesatcommonSNPscanresultfrommanyphenomena.Themarginaleffectsofageneinteractioncanbealtered
substantiallyviadifferencesinallelefrequency,whichmayariseduetodrift.Variantsthat
arerareinbothpopulationsbutdifferentiallytaggedmayalsocontribute,thoughweexpect
thiseffecttobesmall.Anothercontributionlikelycomesfromvariantsthatarerarein
populationonebutcommoninpopulationtwo(andviceversa),whichwillbefilteredinour
analysisbutmaybedifferentiallytagged.
Whilethegeneticcorrelationofmultiplephenotypesinonepopulationhasa
relativelystraightforwarddefinition,extendingthistomultiplepopulationsmotivates
multiplepossibleextensions.Inthisworkwehaveprovidedestimatorsforthecorrelation
ofgeneticeffectandgeneticimpactbutotherquantitiesrelatedtothesharedgeneticsof
complextraitsbetweenpopulationsincludethecorrelationofvarianceexplained
andproportionofsharedcausalvariantsbetweenthetwo
populations.Interestingly,whileourgoalwastoconstructanestimatorthatdeterminedthe
extentofgeneticsharingindependentofallelefrequency,weobservethatthecorrelationof
geneticeffectandgeneticimpactaresimilar.Furthermore,oursimulationsshowthatunder
⇢ge = Cor(�21�
21 ,�
22�
22)
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
11
arandomeffectsmodelutilizingonlySNPswithallelefrequencyabove5%inboth
populationsthetruegeneticeffectandgeneticimpactcorrelationaresimilar.Weconclude
thatatvariantscommoninbothpopulations,differencesineffectsizeandnotallele
frequencyaredrivingthetransethnicphenotypicdifferencesinthesetraits.
Estimatesofthetransethnicgeneticcorrelationareimportantforseveralreasons.
Asdetailedinthiswork,theyprovideinsightintotheextentthatnon-additivegenetic
variationaffectsaphenotype,furtheringourunderstandingofthegeneticarchitectureof
complextraits.Furthermore,theymayhelpinformbestpracticesfortransethnicmeta-
analysis,potentiallyofferingimprovementsovercurrentmethodsthatutilizefsttocluster
populationsforanalysis4.Finally,thetransethnicgeneticcorrelationconstrainsthelimitof
outofsamplephenotypepredictivepower.Ifthemaximumwithinpopulationcorrelationof
predictedphenotypePtotruephenotypeYis ,thenthemaximumoutof
populationcorrelationis (Appendix).
Ourapproachtoestimatinggeneticcorrelationhastwomajoradvantagesover
mixed-modelbasedapproaches.First,utilizingsummarystatisticsallowsapplicationofthe
methodwithoutdata-sharingandprivacyconcernsthatcomewithrawgenotypes.Second,
ourapproachislinearinthenumberofSNPsavoidingthecomputationalbottleneck
requiredtoestimatethegeneticrelationshipmatrix.Conceptually,ourapproachisvery
similartothattakenbyLDscoreregression.Indeed,thediagonaloftheLDmatrixproduct
inonepopulationareexactlytheLD-scores( ).Onecouldignoreourlikelihood-
basedapproachanddefinecross-populationscores inordertoexploitthe
linearrelationship (asimilarapproachcanbetakenfor
thegeneticeffectcorrelation).SinceLD-scoreregressionhasbeensuccessfullyusedto
computethegeneticcorrelationoftwophenotypesinasinglepopulation,thisderivation
canbeviewedasanextensionofLD-scoreregressiontoonephenotypeintwodifferent
populations.Themaindifferenceinourapproachischoosingmaximumlikelihoodrather
thanregressioninordertofitthemodel.Acomparisonofourmethodtotheldscsoftware
showstheyperformsimilarlyasheritabilityestimators(FigureS5).
Ofcourse,ourmethodisnotwithoutdrawbacks.First,itrequiresalargesample
sizeandlargenumberofSNPstoachievestandarderrorslowenoughtogenerateaccurate
estimates.UntilrecentlylargesampleGWAShavebeenrareinnon-Europeanpopulations,
⇢max
Y P
=q
h21
⇢max
Y P
= ⇢ge
qh21
⌃21ii = li
ci =X
m
r1imr2im
E[Z1iZ2i] =
pN1N2
M⇢gi
qh21h
22ci
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
12
thoughtheyarebecomingmorecommon.Similarly,referencepanelqualitymaysufferin
non-Europeanpopulationsandthismayimpactdownstreamanalysis40.Second,itislimited
toanalyzingrelativelycommonSNPs,bothbecausehavinganaccuratediseasemodelis
importantfortheanalysisofrarevariantsandbecauseeffectsizeandcorrelationcoefficient
estimateshaveahighstandarderroratrareSNPs21.Third,ouranalysisiscurrentlylimited
toSNPsthatarepresentinbothpopulations.Indeeditiscurrentlyunclearhowbestto
handlepopulation-specificvariantsinthisframework.Finally,admixedpopulationsinduce
verylong-rangeLDthatisnotaccountedforinourapproachandwearethereforelimited
toun-admixedpopulations21.
Ouranalysisleavesopenseveralavenuesforfuturework.Weapproximately
maximizethelikelihoodofan!×!multivariatenormaldistributionviaamethodthatutilizesonlythediagonalelementsofeachblock,discardingcovarianceinformation
betweenZ-scores.Abetterapproximationmaylowerthestandarderroroftheestimator,
facilitatingananalysisofthegeneticcorrelationoffunctionalcategories,pathwaysand
geneticregions.Wewouldalsoliketoextendouranalysistoincludepopulationspecific
variantsaswellasvariantsatfrequenciesbetween1-5%orlowerthan1%.Oursimulations
indicatethathavinganaccuratediseasemodelisimportantfordeterminingthedifference
betweenthegeneticeffectandgeneticimpactcorrelationwhenrarevariantsareincluded.
Maximumlikelihoodapproachesarewellsuitedtodifferentgeneticarchitectures,for
exampleonecouldexplicitlymodeltherelationshipbetweenallelefrequencyandeffectsize
,whereα=-1correspondstotheinverseassumptionandα=0corresponds
totheindependenceassumption.However,amorethoroughapproachwouldincorporate
additionalsourcesofinformationsuchastheeffectofselection.
2f(1� f) / �↵
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
Appendix
Consider two GWAS of a phenotype conducted in di↵erent populations populations. Assume we have N1
individuals genotyped or imputed to M SNPs in study one and N2 individuals genotyped or imputed to
M SNPs in study two. Let X1, X2 and Y1, Y2 be the matrices of mean-centered genotypes and phenotypes
of the individuals in study one and two, respectively, with f1, f2 the allele frequencies of the M SNPs
common to both populations. Assuming Hardy-Weinberg equilibrium, the allele variances are �21 = 2f1(1�
f1),�22 = 2f2(1� f2). Let �1,�2 be the (unobserved) per-allele e↵ect sizes for each SNP in studies one and
two, respectively. Define the genetic impact correlation ⇢gi
= Cor(p�21�1,
p�22�2) and the genetic e↵ect
correlation ⇢ge
= Cor(�1,�2). We present a maximum likelihood framework for estimating the heritability of
the phenotype in study 1 and it’s standard error, the heritability of the phenotype in study 2 and it’s standard
error, and the genetic e↵ect and impact correlation of the phenotype between the studies and it’s standard
error given only the summary statistics Z1, Z2 and reference genotypes G1, G2 representing the populations
in the studies. We assume that genotypes are drawn randomly from populations with expected correlation
matrices ⌃1 (and similarly for study two), and that every SNP is causal with a normally distributed e↵ects
size (though this assumption is not necessary in practice, see Figure S1).
Genetic impact correlation
Let X 01 = X1p
�
21
(and similarly for study 2) be normalized genotype matrices. We consider the standard linear
model for generation of the phenotypes, where
Y1 = X 01�1 + ✏1
Y2 = X 02�2 + ✏2
For convenience of notation let h2ix
= ⇢gi
ph21h
22. We assume the SNP e↵ects follow the infinitesimal model,
where every SNP has an e↵ect size drawn from the normal distribution, and that the residuals are independent
for each individual and normally distributed:
0
@ �1
�2
1
A ⇠ N
0
@
2
4 0
0
3
5 ,1
M
2
4 h21IM h2
ix
IM
h2ix
IM
h22IM
3
5
1
A (1)
0
@ ✏1
✏2
1
A ⇠ N
0
@
2
4 0
0
3
5 ,
2
4 (1� h21)IM 0
0 (1� h22)IM
3
5
1
A (2)
1
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
where h21, h
22 are the heritability of the disease in study one and two, respectively, and ⇢
gi
is the genetic
impact correlation.
Using the above model, we compute the distribution of the observed Z scores as a function of the reference
panel correlations and the model parameters (h21, h
22, ⇢gi). Given a distribution for Z and an observation of
Z we can then choose parameters which give the highest probability of observing Z. First, we compute the
distribution of Z. It is well known that the Z-scores of a linear regression are normally distributed given
� when the sample size is large enough. Since P(Z) / P(Z|�)P(�) and the product of normal distributions
is normal, we only need to compute the unconditional mean and variance of Z to know its distribution.
Specifically, let Z = [Z>1 , Z>
2 ]>, then it’s mean is
E[Z] = E
2
4X
0>1 Y1pN1
X
0>2 Y2pN2
3
5 =
2
41pN1
(E[X 0>1 X 0
1]E[�1] + E[X 0>1 ]E[✏1])
1pN2
(E[X 0>2 X 0
2]E[�2] + E[X 0>2 ]E[✏2])
3
5 = 0
The within-population variance is:
Cov[Z1i, Z1j ] = E[Z1iZ1j ] = EX,�,✏
[E[Z1iZ1j |X,�, ✏]]
=1
N1EX,�,✏
[X 0>1i (X
01�1 + ✏1)(X
01�1 + ✏1)
>X 01j ]
=1
N1EX,�
[X 0>1i X
01�1�
>1 X 0>
1 X2j ] +1
N1EX,✏
[X 0>1i ✏1✏
>1 X
01j ]
=h21
MN1EX
[X 0>1i X
01X
0>1 X 0
1j ] +1� h2
1
N1EX
[X 0>1i X
01j ]
=h21
MN1(N1Mr1ij +N1
MX
m=1
r1imr1jm +N21
MX
m=1
r1imr1jm) +1� h2
1
N1r1ij
= r1ij +N1 + 1
Mh21⌃1(i)⌃
(j)1
where rpij
= ⌃pij
is the correlation coe�cient of SNP i and j in population p. Similarly, the between-
population variance is:
Cov[Z1i, Z2j ] =1p
N1N2EX,�
[X 0>1i X
01�1�
>2 X 0>
2 X 02j ] +
1pN1N2
EX,✏
[X 0>1i ✏1✏
>2 X
02j ]
=h2ix
MpN1N2
EX
[X 0>1i X
01X
0>2 X 0
2j ]
=h2ix
MpN1N2
(N1N2
MX
m=1
r1imr2jm)
=
pN1N2
Mh2ix
⌃1(i)⌃(j)2
where ⌃(i) denotes the i’th row of ⌃ and ⌃(j) denotes the j’th column. The covariance of the Z-scores is
2
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
thus
C = Var(Z) =
2
4 ⌃1 +N1+1M
h21⌃
21 h2
gx
pN1N2
M
⌃1⌃2
h2gx
pN1N2
M
⌃2⌃1 ⌃2 +N2+1M
h22⌃
22
3
5 (3)
and Z ⇠ N (0, C).
Genetic e↵ect correlation
Let hex
= ⇢ge
ph21h
22. We modify the procedure above to use mean-centered instead of normalized genotype
matrices and model the distribution of the e↵ect sizes as
0
@ �1
�2
1
A ⇠ N
0
B@
2
4 0
0
3
5 ,
2
64h
21
k�21k1
IM
h
expk�2
1k1k�22k1
IM
h
expk�2
1k1k�22k1
IM
h
22
k�22k1
IM
3
75
1
CA (4)
Notice that a linear model with e↵ects sizes acting on un-normalized genotypes is the same as a linear model
with e↵ect sizes acting on normalized genotypes under the substitution �1,2 !q
�21,2�1,2. Therefore the
covariance of Z-scores on the per allele scale can be immediately inferred from the prior derivation
C = Var(Z) =
2
64⌃1 +
N1+1k�2
1k1h21⌃1�2
1⌃1 h2gx
pN1N2p
k�21k1k�2
2k1
⌃1
p�21�
22⌃2
h2gx
pN1N2p
k�21k1k�2
2k1
⌃2
p�22�
21⌃1 ⌃2 +
N2+1k�2
2k1h22⌃2�2
2⌃2
3
75
Approximate maximum likelihood estimation
Let C =
2
4 C11 C12
C21 C22
3
5 be either of the above covariance matrices written in block form. We approximately
optimize the above likelihood as follows: first we find h21 and h2
2 by maximizing the likelihood corresponding
to C11 and C22, then we find ⇢gi
or ⇢ge
by maximizing the likelihood corresponding to C12:
l(h21|Z1,⌃,�) ⇡ �
MX
i=1
w11i
✓ln(C11ii) +
Z21i
C11ii
◆
l(h22|Z2,⌃,�) ⇡ �
MX
i=1
w22i
✓ln(C22ii) +
Z22i
C22ii
◆
l(⇢g{i,e}|Z, h2
1, h22,⌃,�) ⇡ �
MX
i=1
w12i
✓ln(C12ii) +
Z1iZ2i
C12ii
◆
Because we are discarding between-SNP covariance information (Cov(Z1i, Z1j)), highly correlated SNPs will
be overcounted in our approximate likelihood. As a simple example, notice that two SNPs in perfect LD
will each contribute identical terms to the approximate likelihood, and therefore should be downweighted
by a factor of 1/2. The extent to which SNP i is over-counted is exactly the i’th entry in it’s corresponding
3
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
LD-matrix product. Therefore we let wgi
jki
= 1/ (⌃j
⌃k
)ii
and wge
jki
= 1/⇣⌃
j
q�2j
�2k
⌃k
⌘
ii
to reduce the
variance in our estimates of the parameters h21, h
22, ⇢gi and ⇢
ge
.
Furthermore, rather than compute the full products ⌃21, ⌃
22 and ⌃1⌃2 over all M SNPs in the genome, we
choose a window size W and approximate the product by (⌃a
⌃b
)ii
=P
w=i+W
w=i�W
raiw
rbiw
. Though maximum
likelihood estimation admits a straightforward estimate of the standard error via the fisher information, we
found these estimates to be inaccurate in practice. Instead, we use block jackknife with block size equal to
min(100, M
200 ) SNPs to ensure that blocks are large enough to remove residual correlations.
Out of population prediction of phenotypic values
Consider using the results of a GWAS with perfect power in population 2 to predict the phenotypic values of
a set of individuals from population 1. This defines the upper limit of the correlation of true and predicted
phenotypic values. Let the true values of the e↵ects sizes in population 2 be �2. Let the true phenotypes
in population 1 be Y = X1�1 + ✏1 while the predicted phenotypes are P = X1�2. We are interested in the
correlaiton of the predicted and true phenotypes ⇢MAX
Y P
= Cor(Y, P ). Notice that, given X, the true and
predicted phenotype of each individual is an a�ne transformation of a multivariate normal random variable
(�) 2
4 Yi
Pi
3
5 =
2
4 X(i) 0M
0M
X(i)
3
5
2
4 �1
�2
3
5+
2
4 ✏i
0
3
5
and therefore (Yi
, Pi
) for individual i is multivariate normal with expected covariance matrix
EX
[Cov(Yi
, Pi
)] = EX
2
4 X(i) 0M
0M
X(i)
3
5
2
641
k�21k1
IM
h
expk�2
1k1k�22k1
IM
h
expk�2
1k1k�22k1
IM
h
22
k�22k1
IM
3
75
2
4 X(i) 0M
0M
X(i)
3
5>
= EX
2
64
Pm
X
2im
k�21k1
h
ex
Pm
X
2imp
k�21k1k�2
2k1
h
ex
Pm
X
2imp
k�21k1k�2
2k1
h
22
Pm
X
2im
k�22k1
3
75
=
2
4 1 hex
qk�2
1k1
k�22k1
hex
qk�2
1k1
k�22k1
h22k�2
1k1
k�22k1
3
5
and therefore the expected correlation E[Cor(Yi
, Pi
)] is h
exph
22
qk�2
1k1
k�22k1
k�22k1
k�21k1
= ⇢ge
ph21. The expected popula-
tion correlation tends to the sample correlation as the number of samples increases, therefore
⇢MAX
Y P
= Cor(Y, P ) ! ⇢ge
qh21 (5)
as N ! 1
4
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
13
DescriptionofSupplementaldata
Supplementaldataincludefivefiguresandonetable.
Acknowledgements
TheauthorswouldliketoacknowledgeLiorPachterandHilaryFinucaneforinsightful
discussionabouttheproblem.BCBissupportedbytheNSFGRFP.ALPissupportedbyNIH
grantR01HG006399.NZissupportedbyNIHgrantK25HL121295.
WebResources
Popcornisavailableathttps://github.com/brielin/popcorn
References
1.Robinson,M.R.,Hemani,G.,Medina-Gomez,C.,Mezzavilla,M.,Esko,T.,Shakhbazov,K.,Powell,J.E.,Vinkhuyzen,A.,Berndt,S.I.,Gustafsson,S.,etal.(2015).PopulationgeneticdifferentiationofheightandbodymassindexacrossEurope.Nat.Genet.47,1357–1362.2.Burt,V.L.,Whelton,P.,Roccella,E.J.,Brown,C.,Cutler,J.A.,Higgins,M.,Horan,M.J.,andLabarthe,D.(1995).PrevalenceofhypertensionintheUSadultpopulation.ResultsfromtheThirdNationalHealthandNutritionExaminationSurvey,1988-1991.Hypertension25,305–313.3.Coram,M.A.,Candille,S.I.,Duan,Q.,Chan,K.H.K.,Li,Y.,Kooperberg,C.,Reiner,A.P.,andTang,H.(2015).LeveragingMulti-ethnicEvidenceforMappingComplexTraitsinMinorityPopulations:AnEmpiricalBayesApproach.Am.J.Hum.Genet.96,740–752.4.Morris,A.P.(2011).TransethnicMeta-AnalysisofGenomewideAssociationStudies.Genet.Epidemiol.35,809–822.5.Bustamante,C.D.,DeLaVega,F.M.,andBurchard,E.G.(2011).Genomicsfortheworld.Nature475,163–165.6.(2011).Agenome-wideassociationstudyinEuropeansandSouthAsiansidentifiesfivenewlociforcoronaryarterydisease.NatGenet43,339–344.7.Okada,Y.,Wu,D.,Trynka,G.,Raj,T.,Terao,C.,Ikari,K.,Kochi,Y.,Ohmura,K.,Suzuki,A.,Yoshida,S.,etal.(2014).Geneticsofrheumatoidarthritiscontributestobiologyanddrugdiscovery.Nature506,376–381.8.deCandia,T.R.,Lee,S.H.,Yang,J.,Browning,B.L.,Gejman,P.V.,Levinson,D.F.,Mowry,B.J.,Hewitt,J.K.,Goddard,M.E.,O’Donovan,M.C.,etal.(2013).AdditiveGeneticVariationinSchizophreniaRiskIsSharedbyPopulationsofAfricanandEuropeanDescent.Am.J.Hum.Genet.93,463–470.9.Lee,S.,Teslovich,T.M.,Boehnke,M.,andLin,X.(2013).GeneralFrameworkforMeta-analysisofRareVariantsinSequencingAssociationStudies.Am.J.Hum.Genet.93,42–53.10.Palla,L.,andDudbridge,F.(2015).AFastMethodthatUsesPolygenicScorestoEstimatetheVarianceExplainedbyGenome-wideMarkerPanelsandtheProportionofVariantsAffectingaTrait.Am.J.Hum.Genet.97,250–259.
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
14
11.Yang,J.,Ferreira,T.,Morris,A.P.,Medland,S.E.,Madden,P.A.F.,Heath,A.C.,Martin,N.G.,Montgomery,G.W.,Weedon,M.N.,Loos,R.J.,etal.(2012).Conditionalandjointmultiple-SNPanalysisofGWASsummarystatisticsidentifiesadditionalvariantsinfluencingcomplextraits.Nat.Genet.44,369–375.12.InternationalSchizophreniaConsortium(2009).Commonpolygenicvariationcontributestoriskofschizophreniathatoverlapswithbipolardisorder.Nature460,748–752.13.Zuo,L.,Zhang,C.K.,Wang,F.,Li,C.-S.R.,Zhao,H.,Lu,L.,Zhang,X.-Y.,Lu,L.,Zhang,H.,Zhang,F.,etal.(2011).ANovel,FunctionalandReplicableRiskGeneRegionforAlcoholDependenceIdentifiedbyGenome-WideAssociationStudy.PLoSONE6,e26726.14.Fesinmeyer,M.,North,K.,Ritchie,M.,Lim,U.,Franceschini,N.,Wilkens,L.,Gross,M.,Bůžková,P.,Glenn,K.,Quibrera,P.,etal.(2013).Geneticriskfactorsforbodymassindexandobesityinanethnicallydiversepopulation:resultsfromthePopulationArchitectureusingGenomicsandEpidemiology(PAGE)Study.Obes.SilverSpringMd21,10.1002/oby.20268.15.Chang,M.,Ned,R.M.,Hong,Y.,Yesupriya,A.,Yang,Q.,Liu,T.,Janssens,A.C.J.W.,andDowling,N.F.(2011).Racial/EthnicVariationintheAssociationofLipid-RelatedGeneticVariantsWithBloodLipidsintheUSAdultPopulation.Circ.Cardiovasc.Genet.4,523–533.16.Waters,K.M.,Stram,D.O.,Hassanein,M.T.,LeMarchand,L.,Wilkens,L.R.,Maskarinec,G.,Monroe,K.R.,Kolonel,L.N.,Altshuler,D.,Henderson,B.E.,etal.(2010).ConsistentAssociationofType2DiabetesRiskVariantsFoundinEuropeansinDiverseRacialandEthnicGroups.PLoSGenet.6,e1001078.17.Lee,S.H.,Yang,J.,Goddard,M.E.,Visscher,P.M.,andWray,N.R.(2012).Estimationofpleiotropybetweencomplexdiseasesusingsingle-nucleotidepolymorphism-derivedgenomicrelationshipsandrestrictedmaximumlikelihood.Bioinformatics28,2540–2542.18.Bulik-Sullivan,B.,Finucane,H.K.,Anttila,V.,Gusev,A.,Day,F.R.,Loh,P.-R.,ReproGenConsortium,PsychiatricGenomicsConsortium,GeneticConsortiumforAnorexiaNervosaoftheWellcomeTrustCaseControlConsortium3,Duncan,L.,etal.(2015).Anatlasofgeneticcorrelationsacrosshumandiseasesandtraits.NatGenet47,1236–1241.19.Falconer,D.S.,andMackay,T.F.C.(1996).Introductiontoquantitativegenetics(Essex,England:Longman).20.Loh,P.-R.,Tucker,G.,Bulik-Sullivan,B.K.,Vilhjálmsson,B.J.,Finucane,H.K.,Salem,R.M.,Chasman,D.I.,Ridker,P.M.,Neale,B.M.,Berger,B.,etal.(2015).EfficientBayesianmixed-modelanalysisincreasesassociationpowerinlargecohorts.Nat.Genet.47,284–290.21.Bulik-Sullivan,B.K.,Loh,P.-R.,Finucane,H.K.,Ripke,S.,Yang,J.,SchizophreniaWorkingGroupofthePsychiatricGenomicsConsortium,Patterson,N.,Daly,M.J.,Price,A.L.,andNeale,B.M.(2015).LDScoreregressiondistinguishesconfoundingfrompolygenicityingenome-wideassociationstudies.NatGenet47,291–295.22.Finucane,H.K.,Bulik-Sullivan,B.,Gusev,A.,Trynka,G.,Reshef,Y.,Loh,P.-R.,Anttila,V.,Xu,H.,Zang,C.,Farh,K.,etal.(2015).Partitioningheritabilitybyfunctionalannotationusinggenome-wideassociationsummarystatistics.NatGenet47,1228–1235.23.Yang,J.,Benyamin,B.,McEvoy,B.P.,Gordon,S.,Henders,A.K.,Nyholt,D.R.,Madden,P.A.,Heath,A.C.,Martin,N.G.,Montgomery,G.W.,etal.(2010).CommonSNPsexplainalargeproportionoftheheritabilityforhumanheight.Nat.Genet.42,565–569.24.So,H.-C.,Li,M.,andSham,P.C.(2011).Uncoveringthetotalheritabilityexplainedbyalltruesusceptibilityvariantsinagenome-wideassociationstudy.Genet.Epidemiol.n/a–n/a.25.Vattikuti,S.,Guo,J.,andChow,C.C.(2012).HeritabilityandGeneticCorrelationsExplainedbyCommonSNPsforMetabolicSyndromeTraits.PLoSGenet.8,e1002637.26.’tHoen,P.A.C.,Friedländer,M.R.,Almlöf,J.,Sammeth,M.,Pulyakhina,I.,Anvar,S.Y.,Laros,J.F.J.,Buermans,H.P.J.,Karlberg,O.,Brännvall,M.,etal.(2013).Reproducibilityof
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
15
high-throughputmRNAandsmallRNAsequencingacrosslaboratories.Nat.Biotechnol.31,1015–1022.27.Morris,A.P.,Voight,B.F.,Teslovich,T.M.,Ferreira,T.,Segrè,A.V.,Steinthorsdottir,V.,Strawbridge,R.J.,Khan,H.,Grallert,H.,Mahajan,A.,etal.(2012).Large-scaleassociationanalysisprovidesinsightsintothegeneticarchitectureandpathophysiologyoftype2diabetes.Nat.Genet.44,981–990.28.Cho,Y.S.,Chen,C.-H.,Hu,C.,Long,J.,HeeOng,R.T.,Sim,X.,Takeuchi,F.,Wu,Y.,Go,M.J.,Yamauchi,T.,etal.(2012).Meta-analysisofgenome-wideassociationstudiesidentifieseightnewlocifortype2diabetesineastAsians.NatGenet44,67–72.29.Su,Z.,Marchini,J.,andDonnelly,P.(2011).HAPGEN2:simulationofmultiplediseaseSNPs.Bioinformatics27,2304–2305.30.Lee,S.H.,Wray,N.R.,Goddard,M.E.,andVisscher,P.M.(2011).EstimatingMissingHeritabilityforDiseasefromGenome-wideAssociationStudies.Am.J.Hum.Genet.88,294–305.31.Speed,D.,Hemani,G.,Johnson,M.R.,andBalding,D.J.(2012).ImprovedHeritabilityEstimationfromGenome-wideSNPs.Am.J.Hum.Genet.91,1011–1021.32.Yang,J.,Bakshi,A.,Zhu,Z.,Hemani,G.,Vinkhuyzen,A.A.E.,Lee,S.H.,Robinson,M.R.,Perry,J.R.B.,Nolte,I.M.,vanVliet-Ostaptchouk,J.V.,etal.(2015).Geneticvarianceestimationwithimputedvariantsfindsnegligiblemissingheritabilityforhumanheightandbodymassindex.Nat.Genet.47,1114–1120.33.Price,A.L.,Helgason,A.,Thorleifsson,G.,McCarroll,S.A.,Kong,A.,andStefansson,K.(2011).Single-TissueandCross-TissueHeritabilityofGeneExpressionViaIdentity-by-DescentinRelatedorUnrelatedIndividuals.PLoSGenet.7,e1001317.34.Gaffney,D.J.(2013).GlobalPropertiesandFunctionalComplexityofHumanGeneRegulatoryVariation.PLoSGenet.9,e1003501.35.Stahl,E.A.,Wegmann,D.,Trynka,G.,Gutierrez-Achury,J.,Do,R.,Voight,B.F.,Kraft,P.,Chen,R.,Kallberg,H.J.,Kurreeman,F.A.S.,etal.(2012).Bayesianinferenceanalysesofthepolygenicarchitectureofrheumatoidarthritis.Nat.Genet.44,483–489.36.Marenberg,M.E.,Risch,N.,Berkman,L.F.,Floderus,B.,anddeFaire,U.(1994).Geneticsusceptibilitytodeathfromcoronaryheartdiseaseinastudyoftwins.N.Engl.J.Med.330,1041–1046.37.Nora,J.J.,Lortscher,R.H.,Spangler,R.D.,Nora,A.H.,andKimberling,W.J.(1980).Genetic--epidemiologicstudyofearly-onsetischemicheartdisease.Circulation61,503–508.38.Chen,X.,Kuja-Halkola,R.,Rahman,I.,Arpegård,J.,Viktorin,A.,Karlsson,R.,Hägg,S.,Svensson,P.,Pedersen,N.L.,andMagnusson,P.K.E.(2015).DominantGeneticVariationandMissingHeritabilityforHumanComplexTraits:InsightsfromTwinversusGenome-wideCommonSNPModels.Am.J.Hum.Genet.97,708–714.39.Zhu,Z.,Bakshi,A.,Vinkhuyzen,A.A.E.,Hemani,G.,Lee,S.H.,Nolte,I.M.,vanVliet-Ostaptchouk,J.V.,Snieder,H.,Esko,T.,Milani,L.,etal.(2015).DominanceGeneticVariationContributesLittletotheMissingHeritabilityforHumanComplexTraits.Am.J.Hum.Genet.96,377–385.40.Marchini,J.,andHowie,B.(2010).Genotypeimputationforgenome-wideassociationstudies.NatRevGenet11,499–511.41.Ma,R.C.W.,andChan,J.C.N.(2013).Type2diabetesinEastAsians:similaritiesanddifferenceswithpopulationsinEuropeandtheUnitedStates:DiabetesinEastAsians.Ann.N.Y.Acad.Sci.1281,64–91.
FigureTitlesandLegends
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
16
Figure1:Standarderrorofourestimateofgeneticeffectandimpactcorrelationasa
functionofthenumberofSNPsandindividuals.Phenotypesweresimulatedwithh12=0.5,
h22=0.5andρgi,e=0.5andgenotypesweresimulatedwithHapGen2tomatchEuropean(EUR)
andEastAsian(EAS)populations.
Figure2:DistributionofgeneticcorrelationcomparisonbetweenpopcornandGCTA.
Distributionwascomputedusingagaussiankdeonthesetofgeneticcorrelationestimates.
Figure3:Geneticcorrelationasafunctionofheritabilityforgeneexpression.Themeanand
standarderrorofthegeneticcorrelationofthesetofgeneswithh12andh22exceeding
thresholdhineachanalysis(y-axis)isplottedagainsth(x-axis).
Tables
Table1:Trueandestimatedvaluesofthegeneticimpactandeffectcorrelationinsimulated
EUR-likeandEAS-likegenotypes.Resultsaretheaverageof100simulationswith
phenotypeheritabilityof0.5ineachpopulation.
Table2:HeritabilityandgeneticcorrelationofRAandT2DbetweenEURandEAS
populations.EURRAdatacontained8,875casesand29,367controlsforastudyprevalence
of0.23.EASRAdatacontained4,873casesand17,642controlsforastudyprevalenceof
0.22.RAdiseaseprevalencewasassumedtobe0.5%inbothpopulations7.T2DEURdata
contained12171casesand56862controlsforastudyprevalenceof0.18.T2DEASdata
contained6952casesand11865controlsforastudyprevalenceof0.37.T2DEUR
prevalencewasassumedtobe8%27whileT2DEASprevalencewasassumedtobe9%41.
RA T2D
MAF>0.01 MAF>0.05
Model Independent 0.500 0.478 0.490 0.459 0.500 0.489 0.490 0.469
Inverse 0.438 0.500 0.520 0.493 0.479 0.500 0.503 0.489
Rare 0.500 0.482 NA NA 0.500 0.498 0.916 0.731
Common 0.500 0.500 0.308 0.286 0.500 0.500 0.298 0.277
Difference 0.500 0.419 0.428 0.414 0.500 0.461 0.479 0.468
Adversarial 0.695 0.588 0.730 0.676 0.698 0.648 0.739 0.698
⇢ge ⇢gi ⇢ge ⇢gi ⇢ge ⇢gi ⇢ge ⇢gi
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
17
hEUR2obs 0.265(0.022) 0.146(0.008)
hEAS2obs 0.320(0.038) 0.093(0.020)
hEUR2lia 0.177(0.015) 0.242(0.013)
hEAS2lia 0.221(0.026) 0.105(0.021)
ρge 0.463(0.058) 0.621(0.088)
pρge>0 1.37e-15 1.70e-12
pρge<1 2.53e-20 1.66e-05
ρgi 0.455(0.056) 0.606(0.083)
pρgi>0 8.16e-16 2.85e-13
pρgi<1 4.87e-22 2.06e-06
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;
.CC-BY-NC-ND 4.0 International licenseIt is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/036657doi: bioRxiv preprint first posted online Jan. 14, 2016;