Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide...

22
1 Title Revealing the causative variant in Mendelian patient genomes without revealing patient genomes Authors Karthik A. Jagadeesh 1,5 , David J. Wu 1,5 , Johannes A. Birgmeier 1 , Dan Boneh 1,2,6 , Gill Bejerano 1,3,4,6 Affiliations 1 Department of Computer Science, Stanford University 2 Department of Electrical Engineering, Stanford University 3 Department of Developmental Biology, Stanford University 4 Department of Pediatrics (Medical Genetics), Stanford University 5 These authors contributed equally 6 Corresponding authors: [email protected] (D.B) and [email protected] (G.B) Abstract Given the rapidly growing utility of critical health information revealed in the human genome, secure genomic computation is essential to moving forward, especially as genome sequencing becomes commonplace. We devise and implement proof-of-principle computational operations for precisely identifying causal variants in Mendelian patients using secure multiparty computation methods based on Yao’s protocol. We show multiple real scenarios (small patient cohorts, trio analysis, two hospital collaboration) where the causal variant is discovered jointly, while keeping up to 99.7% of all participants’ most sensitive genomic information private. All similar operations performed today to diagnose such cases are done openly, keeping 0% of participants’ genomic information private. Our work will help usher in an era where genomes can be both utilized and truly protected. Introduction Rare diseases affect 1 in 33 babies. Exome and genome sequencing have revolutionized the diagnosis of thousands of rare Mendelian diseases to thousands of different human genes 1–3 . Thousands of additional rare Mendelian diseases and human genes await discovery. Frequency-based filters have proven extremely effective in providing diagnosis in such cases 4 . In essence, variants found in a control population (common variants) are likely to be benign 5 while functional rare variants not found in the control population but seen in multiple affected individuals are likely to be disease causing 6–8 . These filters seek the gene or variant present in all (most) affected individuals but in no (very few) unaffected individuals. For example, one can take a small cohort of unrelated individuals suspected of suffering from the same genetic disorder, and compare their genomes to that of tens of thousands of unaffected individuals (e.g., from the exome aggregation consortium, ExAC 5 ). As we show below, in multiple scenarios, the gene with rare functional mutations in most patients in our small cohorts is indeed causal of their condition. Frequency-based computation highlights the fundamental “serve or protect” dilemma of genomic data: “Serve:” to find the root cause of a patient’s disease, one wishes to compare a patient genome to as many other genomes as possible, both affected and unaffected, related and unrelated. Thus, to advance modern medicine, all sequenced genomes should be shared. “Protect:” one’s genome continues to reveal more and more about oneself, including susceptibility to a variety of diseases 9 . . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 27, 2017. ; https://doi.org/10.1101/103655 doi: bioRxiv preprint

Transcript of Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide...

Page 1: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

1

TitleRevealingthecausativevariantinMendelianpatientgenomeswithoutrevealingpatientgenomes

AuthorsKarthikA.Jagadeesh1,5,DavidJ.Wu1,5,JohannesA.Birgmeier1,DanBoneh1,2,6,GillBejerano1,3,4,6

Affiliations1DepartmentofComputerScience,StanfordUniversity2DepartmentofElectricalEngineering,StanfordUniversity3DepartmentofDevelopmentalBiology,StanfordUniversity4DepartmentofPediatrics(MedicalGenetics),StanfordUniversity5Theseauthorscontributedequally6Correspondingauthors:[email protected](D.B)[email protected](G.B)

AbstractGiventherapidlygrowingutilityofcriticalhealthinformationrevealedinthehumangenome,securegenomiccomputationisessentialtomovingforward,especiallyasgenomesequencingbecomescommonplace.Wedeviseandimplementproof-of-principlecomputationaloperationsforpreciselyidentifyingcausalvariantsinMendelianpatientsusingsecuremultipartycomputationmethodsbasedonYao’sprotocol.Weshowmultiplerealscenarios(smallpatientcohorts,trioanalysis,twohospitalcollaboration)wherethecausalvariantisdiscoveredjointly,whilekeepingupto99.7%ofallparticipants’mostsensitivegenomicinformationprivate.Allsimilaroperationsperformedtodaytodiagnosesuchcasesaredoneopenly,keeping0%ofparticipants’genomicinformationprivate.Ourworkwillhelpusherinanerawheregenomescanbebothutilizedandtrulyprotected.

IntroductionRarediseasesaffect1in33babies.ExomeandgenomesequencinghaverevolutionizedthediagnosisofthousandsofrareMendeliandiseasestothousandsofdifferenthumangenes1–3.ThousandsofadditionalrareMendeliandiseasesandhumangenesawaitdiscovery.Frequency-basedfiltershaveprovenextremelyeffectiveinprovidingdiagnosisinsuchcases4.Inessence,variantsfoundinacontrolpopulation(commonvariants)arelikelytobebenign5whilefunctionalrarevariantsnotfoundinthecontrolpopulationbutseeninmultipleaffectedindividualsarelikelytobediseasecausing6–8.Thesefiltersseekthegeneorvariantpresentinall(most)affectedindividualsbutinno(veryfew)unaffectedindividuals.

Forexample,onecantakeasmallcohortofunrelatedindividualssuspectedofsufferingfromthesamegeneticdisorder,andcomparetheirgenomestothatoftensofthousandsofunaffectedindividuals(e.g.,fromtheexomeaggregationconsortium,ExAC5).Asweshowbelow,inmultiplescenarios,thegenewithrarefunctionalmutationsinmostpatientsinoursmallcohortsisindeedcausaloftheircondition. Frequency-basedcomputationhighlightsthefundamental“serveorprotect”dilemmaofgenomicdata:“Serve:”tofindtherootcauseofapatient’sdisease,onewishestocompareapatientgenometoasmanyothergenomesaspossible,bothaffectedandunaffected,relatedandunrelated.Thus,toadvancemodernmedicine,allsequencedgenomesshouldbeshared.“Protect:”one’sgenomecontinuestorevealmoreandmoreaboutoneself,includingsusceptibilitytoavarietyofdiseases9.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 2: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

2

Sharingitwithotherscanleadtodiscriminationandbias.Toprotectitsownerandnextofkin,nosequencedgenomeshouldbeshared. Todate,thisdilemmahasbeensolvedbyallowinginstitutionsunrestrictedaccesstoallthegenomesintheirpossession.Limitedsharingbetweeninstitutionsisdonebyprovidingobfuscatedsummarystatistics10.Currentcommonlyadoptedmethodsforsharinghaveshortcomingsthatmakethemsuboptimal.Providingfullaccessatindividualinstitutionsallowsfortoomuchinformationtobesharedincertainsituations11.Disease-specificbeaconsarepronetoattackandcanendupidentifyingindividualsparticipatinginthestudy12.Beaconsalsoonlyprovideallele-presencequerycapabilitiesanddonothavetheflexibilityneededforanalyzingmultifactorialvariantinteractionswithinanindividual13.Itisalsoriskytosharegenomicdataintheclearwiththird-partyservicesspecializingingenomicanddiseaseanalysis.Weareunawareofanycryptographically-securemethodforsharinggenomicdatatoperformcomputationaloperationsthatallowidentifyingcausalvariantsinpatients. Tobetterresolvethisdilemma,wefirstnotethatwhileallofthegenomicvariantsfromallindividualsareneededtoperformthecomputation,onlyahandfulofcausalvariantsareultimatelyofinterestinthecontextofMendelianpatients(intheexampleabove,justtherarevariantsinthesinglegenemutatedinmostpatients).

Weintroducehereamodern,proof-of-conceptcryptographicimplementationwhichbothservesandprotects.Thesecurecomputationcanberunonentiregenomes(Serve),whilenopartyinvolvedinthecomputationlearnsanythingabouttheinputsoftheotherparticipantsexceptfortheoutputwhichiscomputedtogether(Protect).Weuserealpatientdatatoshowthatoursecureimplementationrevealsminimalinformationwhilediagnosingpatientgenomesthrough3differentstrategiesusingpracticalamountsofcomputetimeandmemory.Cryptographicmethodshavebeenusedindifferentgenomiccontextssuchasmicrobiomeanalysis14,GWASanalysis15andgenomicalignment16,butthisisthefirstimplementationthatweareawareofthatisgearedtowardsdiagnosingMendelianpatients,atimelyandpotentneed.

MethodsRepresentinggenomicdataasvectorsAssumeeachindividualinvolvedinastudyhasprivateaccesstotheirexome(orgenome).Ifwearelookingtoidentifyacausalvariant,wedefineavariantvector(longlistoflength28,413,589)ofallpossibleraremissense/nonsensevariantsinthehumangenomefromthefirstgeneonchromosome1tothelastgeneonchromosomeY.Weprovideacopyofthisvectortoeachindividual(affectedandunaffected),andaskthemtoprivatelydenoteTrue/Falsenexttoeachvariant(toindicatewhethertheyhavethespecificmutationornot,respectively).Ifwearelookingtoidentifyacausalgene,weprovideeachindividualagenevectorof20,663genesinthehumangenomefromA1BGtoZZZ3.Weaskthemtowrite“1”nexttoageneiftheyhaveoneormorerarefunctionalvariantsinthisgene,andotherwise,theywrite“0”.SeeSupplementaryFigure1A,B.

Definingcomputationsofinterest(MAX,INTERSECTION,SETDIFF)Wedefinethreeoperationsusedforpatientdiagnosis(SupplementaryFigure1C).Imaginetwoaffectedindividualsarerepresentedbytworarefunctionalvariantvectors(True/Falselists).Intersectingthesetwovectorswillrevealalltherarefunctionalvariantstheyshare.Formally,weperformaBooleanINTERSECTION(orAND)operation(xANDy=Trueonlyifx=y=True,andotherwise,itisFalse)betweenallpossiblepatientvariants.Next,ifwealsohaveaccesstoanunaffectedfamilymember,we

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 3: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

3

canfurtherexcludeanyvarianttheyshare.WedothiswithaBooleansetdifference(SETDIFF)operation(xSETDIFFy=Trueonlyifx=Trueandy=False,otherwiseitisFalse).Finally,imaginewehaveaccesstoasmallcohortofunrelatedpatientssharingasetofphenotypes.Wewouldliketofindthegeneaffectedbyoneormorerarefunctionalvariantsinthegreatestnumberofpatientswithinthecohort.Forthis,weusethepatientgenevectors(0/1lists).Wesum0/1sacrosspatientsforeachgene,andthenweusethemaximum(MAX)operationtofindtheentry(gene)withthegreatestnumber(ofaffectedcases;SupplementaryFigure1C).

Remarkably,moderncryptographyallowsanynumberofindividualstojointlylearnthefinalresultoftheseMAX,SETDIFF,INTERSECTIONoperationswithoutanyofthemlearninganythingelseabouteachother’sgenomes(orvectors).

EncryptionanddecryptionAnimportantcryptographicprimitivewerelyonisasecret-keyencryptionscheme.Inasecret-keyencryptionscheme,asecretkeyisusedtoencryptanddecryptmessageswiththeguaranteethattheencryptionsofanytwomessagesareindistinguishable,andyet,theycanbesuccessfullydecrypted(toobtaintheoriginalmessage)giventhekey(SupplementaryFigure2).

SecuremultipartycomputationMultiplemathematicalframeworksandcomputationalimplementationsexistforsecuremultipartycomputationonencrypteddata.Theseprovidedifferenttradeoffsincomplexityandefficiency17.Inthiswork,weuseYao’sprotocoltosecurelyevaluatefunctionsbetweentwoparties18.Abstractly,wewritethefunctionas!(#$, #&)where#$denotestheinputofthefirstpartyand#&denotestheinputofthesecondparty.Anyfunction! #$, #& canberepresentedbyacombinationofBooleanoperations(forexample,seeSupplementaryFigure3).Yao’sprotocolprovidesawayofevaluatingtheBooleancircuit(operator-by-operator)withoutrevealingtheinputs#$,#&.WeillustratethisindetailinFigure1.

WhileYao’sprotocolprovidesasimpleandefficientsolutionforsecuretwo-partycomputation,inmanyofthescenarioswedescribe,thecomputationoccursamongmultipleparties(e.g.,manyindividuals,eachwiththeirpersonalgenome).Itisverystraightforwardtoreducethegeneralproblemofsecuremultipartycomputationtothatofsecuretwo-partycomputationbyworkinginthe“two-cloud”model.Inthetwo-cloudmodel,weassumethattherearetwonon-colludingservers(e.g.,thesecouldbemanagedbytwoindependentgovernmentagencies)thataggregatetheinputsfromeachpartyinaprivacy-preservingmannerandthenperformthecomputation.EachserveronitsownhasnoknowledgeofthedataasshowninFigure1.7(seeOnlineMethodsandDiscussion).

ProtectionquotientWedefinetheProtectionQuotientasthefractionofprivateinformationthatisnotexposed(toneithertheotherparticipantsnortheentityrunningthecomputation)duringthecomputation.Usingourencryptionscheme,theProtectionQuotientequalsthetotalnumberofpatientvariantswithheldfromtheoutputdividedbythetotalnumberofpatientvariantsinputintothecomputation.Standardunencryptedpatientdiagnosisoperationshaveaprotectionquotientof0%,becauseallvaluesmustbeexposedtoperformthecomputation.Allourapplicationsbelowhaveaprotectionquotientof97.1-99.7%,maximizingprivacywhileretainingfullutility.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 4: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

4

ResultsExampleMendelianapplicationsofsecurecomputationToprovethepragmaticutilityofourapproach,wedemonstratethreedifferentsecureoperationsoverrealMendelianpatientswherewesuccessfullyidentifythecausalvariantsineachscenario(Table1):

MAXidentifiesthecausalgeneinsmallpatientcohortswithprotectionquotientabove99%Weuse4smallcohortsofunrelatedindividuals,sufferingfromverydifferentrarediseases:FreemanSheldonSyndrome(FSS),Hadju-CheneySyndrome(HCS),KabukiSyndrome(KaS)andMillerSyndrome(MiS).Eachindividualholdsaprivatelistof211-374rarefunctionalvariantsin210-356genes(total767-2,754variantspercomputation).WeusethesecureMAXfunctiontorevealonlythetopgenemutatedacrosspatientsineachcohort.Inall4cohorts,wefindthatthegenemutatedinmostindividualsistheonethathasbeenproventobethecausalgene:MYH3inFSS6,NOTCH2inHCS19,KMT2DinKaS8andDHODHinMiS7(Table1a).

Securecomputationonlyrevealsthevariantsinthemostmutatedgeneineachcohortwhileprotectingtheremaining764variantsinFSS,1845variantsinHCS,2746variantsinKaSand1055variantsinMiS.Thiscomputationhasaprotectionquotientof99.3-99.7%forall4cohortdiseasedatasets.Thecomputationisperformedoverall20,663genesandcompletesinjust5-10seconds,withoneserverontheEastCoastandtheotherontheWestCoast(Table1a).Thetotalprotocolexecutiontime,bandwidthandcomputetimeallgrowlogarithmicallywiththenumberofcohortindividualsinvolvedinthesecurecomputation(SupplementaryFigure4A).

SETDIFFidentifiesthecausalvariantinatriowithprotectionquotient99.6%Unaffectedmotherandfather,andaffectedmalechildwithfemaleexternalgenitalia,eachholdsalistof164-185(total524)rarefunctionalvariantsfoundintheirexomes.ThesecureSETDIFFoperationrevealstothefamilyandtestprovidersonly2rarevariantsfoundinthechildbutinneitherparent(Table1b).Literaturereviewprovidesadiagnosisbasedononeofthesetwovariants:theACTBgene20.

Securecomputationkeeps522variantsprivatewhilesharingonly2variantswiththetestproviderandallindividualsinvolvedinthecomputation.Thiscomputationhasaprotectionquotientof99.6%.Becausethreepartiesarenowinvolved,thetotalcomputationtimeusingasingleserverthreadoneithercoastis57minutes(Table1b).However,thevariantlistcaneasilybesplitbetweenasmallcomputerarrayoneithercoast,suchthatatypical30-nodeclusterbringscomputationtimedowntounder2minutes.Theprotocolexecutiontime,bandwidthandcomputetimeallgrowlogarithmicallywiththenumberoffamilymembersinvolvedinthesecurecomputation(SupplementaryFigure4B).

INTERSECTIONidentifiespatientsofinterestacross2hospitalswithprotectionquotient97.1%Twoormoregenomecentersmaywanttocomparetheirpatientliststoseeiftogethertheycanfindmultiplepatientswiththesamerarefunctionalmutation,andsimilarphenotypes,whilerevealingnothingelsetoeachother.Forexamplewetook928WashingtonMendelianCenter(WMC)patientsand282BaylorHopkinsCenter(BHC)patients.Foreachhospitalwepreparedalistofover5,000rarefunctionalvariantsseeninoneormoreoftheirpatients.UsingthesecureANDfunction,thetwohospitalsfindashortlistofjust159variantspresentinbothhospitals,pointingatpatientswhowouldbenefitfromphenotypecomparison.Thisshortlistincludes“positivecontrols”suchasknowndiseasevariantNOTCH1:p.E694K,associatedwithpartial/incompletepenetranceofaorticvalvedisease21.IndeedtheWMCandBHCpatientsarephenotypicallycharacterizedwithleftventricularoutflowdefectandthoracicaorticaneurysm,respectively.Thelistalsooffersexcitingnovelgene-diseaseassociations

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 5: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

5

suchasrarefunctionalvariantHCN3:p.R648H(withfrequency5.47·10-5and0inExACand1000genomesdata,respectively).HCN3isavoltage-gatedcationchannelgene,whosemouseknockoutcausesabnormalventricularactionpotentialwaveform22.Promisingly,inpatientsfromWMCandBHC,thismutationiscorrelatedwithdilatedcardiomyopathyandcoarctationoftheaorta,respectively.

Securecomputationonlyreveals2x159potentialcausativevariantswhileprotectingtheremaining10,749variantswithaprotectionquotientof97.1%.Thiscomputationisperformedoverallrarefunctionalvariantsintheexomewithatotalprotocolexecutiontimeof9.4minutesusingasingleserverthreadoneithercoast(Table1c).Becauseeveryvariantisevaluatedindependently,a30-nodecomputeclusteroneitherendwillreducetotalcomputationtimetobelow20seconds.AswelearntoappreciateMendelianmutationsoutsideoftheexome,thetotaltime,bandwidthandcomputetimescalelinearlywiththesizeofthevariantlistsharedforsecurecomputation(SupplementaryFigure4C).

DiscussionRarediseasesarecumulativelycommon(someestimatethat10%oftheUSpopulationareaffectedwithraredisorders).About7,000rareMendelianconditionshavebeendescribedtodate.Ofthese,approximately4,000havebeendefinitivelydiagnosedassinglegenediseases,mappingtoover4,000genesinthegenome.Theprocedureswedescribeareapplicableforallofthese.Thereareonlyahandfulofmedicalconditionsdiagnosedwithcertaintytotheinteractionsofjust2genes.Farlessisknownfordiseasescausedbymorethan2genes.PersonalGenomicsposesafundamental“serveorprotect”dilemma:shouldoneservetheirgenomeintheserviceofbetterdiagnosisandultimatelydiseaseeradication,orshouldoneprotectoneselfandnextofkinagainstpotentialdiscriminationbyrefusingtosharetheirgenome.ThisdilemmaisparticularlyevidentinthefieldofMendeliandiseases.Itisessentialtodeveloptoolsandmethodstoeffectivelysharegenomeswhilemaintainingtheirprivacyandsecurity.Becausegenomeprivacyisbestservedwhereadefinitivediagnosisexists,wefocusonsinglediseasegenediscoveryanddiagnosis.HerewepresentasecureapproachformultiplepartiestoperformexactcomputationsthatdiagnoseMendeliandiseases,whilekeepingallparticipatinggenomesprivate.

Thescenarioswepresentareallreal.Genomeprivacyisextremelyappealinginallofthem:Completestrangersindiseasecohorts(Table1A)learnnothingabouteachotherexcepttheirshareddisease-causinggenemutations.Forparticipantswheretheassaydoesnotprovideananswer,absolutelynothingisrevealed.Inlargerfamilytrees,moredistantlyrelatedmemberswillappreciategenomeprivacy.Eveninayoungnuclearfamily(e.g.,atrio;Table1B),thetestproviderlearnsalmostnothingexceptthelikelydisease-causingmutationintheoffspring.Moreover,theylearnvirtuallynothingabouttheparentsthemselves.Inthetwohospitalscenario(Table1C),onlyvariantsthatareworthwhilecomparingarerevealedwhilethevastmajorityofvariantsremainprivatetoeachinstitute’sresearchersandpatients.

Inallofthesecases,thequantitiesrevealedandthosethatremainprivateareaprivacyadvocate’sdreamcometrue:Wepredominantlyrevealonlyvariant/scrucialforpatientdiagnosis,familycounselingandanypotentialtreatment.Whatremainsprivateispredominantlyvariantsofunknownsignificance(VUS)thatareoflittlevaluefordiagnosingone’smedicalcondition.However,thesesameVUSvariantsalmostcertainlyuniquelyidentifyapersonasaparticipantinananalysis,andhavethepotentialtorevealnoworinthefutureotherpersonaltraitsthatmaybefurthercausefordiscrimination.

Forthisproof-of-conceptwork,weassumethattheprotocolparticipantsare“honest-but-curious,”(sometimesreferredtoas“semi-honest”)—thatis,weassumethatthepartiesareproperly

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 6: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

6

incentivizedtohonestlyfollowtheprotocol,butattheendoftheprotocolexecution,theymaytrytolearnsomeadditionalinformation(aboutotherparties’inputs)basedonthemessagestheyreceiveduringtheprotocolexecution.Wesaythataprotocolissecureiftheonlyinformationanypartylearnsbyparticipatingintheprotocolcanbeinferredjustfromthatparties’inputandtheoveralloutputofthecomputation.Inotherwords,noneofthepartiesshouldbeabletolearnsomethingaboutanotherparties’inputotherthanwhatisexplicitlyrevealedbytheoutputofthefunction.

Yao’sprotocolgivesanefficientsolutionforsecuretwo-partycomputationinthepresenceofsemi-honestadversaries23.Wenotethattherearewell-establishedwaystoextendYao’sprotocoltoadditionallyprovidesecurityagainstmaliciouspartieswhodeviatefromtheprotocoldescriptioninordertocompromisetheprivacyofotherparticipantsorcorrupttheresultsofthecomputation24.Inaddition,protectingagainstparticipantsthatsubmitmalicious(ormalformed)inputstotheprotocolcanbedonebyensuringthatifaparticipant’svariantvectordoesnotmeetcertaincriteria,orisnotaccompaniedbyanappropriatecertificate,thenthecomputationabortsanddoesnotproduceanyoutput.Furthermore,inthispaper,weintroduceanoperation-specific“protectionquotient”,anovelmetrictoassessthefractionofinformationsecuredbythecomputation.Theprotectionquotientcanbeusedtofurtherrestricttheoutputreturnedtoallpartiesifthedefinedprivacyrequirementsarenotmet.Forinstance,ifatrioanalysisresultsinmorethanafewexpecteddenovoexomemutations,onlyanerrormessagewillbeproduced.Thisapproachispreferredforexampletodifferentialprivacy25,26whichaddsrandomgenomicvariationasnoiseintoaggregatedsummarystatisticstotryandavoidindividualidentificationinpooledgenomicsdata15.

Thebasicprincipleunderlyingourdesignistoperformexactsecurecomputationonthecomplete(private)genomesofallparticipatingindividuals.Thisisindirectcontrasttothemoretraditionalandlesseffectiveroutesofpublishingobfuscatedfrequenciesaggregatedacrossmultipleindividuals.Thecomputationalresourcesweusetoretaingenomicprivacyarenotnegligible,yetareperfectlywithinthecapabilitiesofoff-the-shelfmoderncomputerstocompletetheoperationinsecondsorminutes,evenwhencommunicatingbetweentheEastandWestcoasts.Andwhilenosecuritymechanismmaybeperfectlyimpenetrable,itiscertainlypreferabletohaveasecuritymechanisminplace(especiallyifitallowsforexactcomputation)wherenonecurrentlyexist.Manyfurtherextensionsandapplicationsofourcomputationalframeworkarepossible,andaresuretoprovideincentivesforthedevelopmentofmoresecureandfastermethods.Awidespreaddeploymentofcomputerlibrariesefficientlyimplementingtheseprincipleswillencourageindividualstosecurelycontributetheirgenomesforthecommongood,andthusgreatlyfueladvancesinbothpersonalgenomicsandprivacyinthe21stcentury.

On-LineMethodsPatientdatasetsWholeexomesequencesofpatientswereobtainedfromdbGaPstudiesphs000204.v1.p16(FreemanSheldonSyndrome),phs000244.v1.p17(MillerSyndrome),phs000295.v1.p18(KabukiSyndrome),andphs000477.v1.p1(Hajdu-CheneySyndrome).Pre-processedvariantcallformat(VCF)filesforpatientsfrom2CentersforMendelianGenomicswereobtainedfromdbGaPstudiesphs000693.v4.p1(UniversityofWashington),andphs000711.v3.p1(BaylorHopkins).OurtriofamilywasobtainedfromStanfordHospital.AllhumansubjectresearchwasperformedunderguidelinesapprovedbytheStanfordInstitutionalReviewBoard.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 7: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

7

SequencingreadsweremappedtotheGRCh37/hg19assemblyofthehumangenomeusingBWAMEMv0.7.10-r78927.VariantswerecalledusingGATKv3.4-46-gbc02625followingtheHaplotypeCallerworkflowfromtheGATKBestPractices28.

VariantannotationANNOVARv527wasusedtoannotatevariantswithpredictedeffectonproteincodinggenesusinggeneisoformsfromtheENSEMBLgenesetversion75forthehg19/GRCh37assemblyofthehumangenome29,30.Allcanonicalgeneisoformswereusedwherethetranscriptstartandendaremarkedascompleteandthecodingspanisamultipleofthree.

CryptographictechniquesInasecuremultipartycomputation(MPC)protocol18,31,agroupofusers(oftencalledparties)seektojointlycomputeafunctionovertheirinputswithoutrevealinganyadditionalinformationabouttheirparticularinputs.Thefunctionthatthepartiescomputeisdeterminedbasedonthespecificscenario.Thecomputationconsistsofseveralroundsofinteraction,whereineachround,thepartiesexchangeaseriesofmessages.Attheconclusionoftheprotocol,eachparticipantlearnstheoutputofthecomputationevaluatedoneveryone’sjointinput.Noadditionalinformationbeyondtheexplicitoutputisrevealedtoanyparty(theprocessisabstractedinFigure1).

EveryarithmeticcomputationcanbeexpressedasasequenceofBooleanlogicaloperations(thatis,operationsonbits 0,1 ).Thisispreciselyhowthemoderncomputerworks.Yao’sprotocolallowstwousers,AliceandBob,tocomputearbitraryfunctionsovertheirinputs.Moreprecisely,ifAlicehasaninput#andBobhasaninput*,Yao’sprotocolallowsthemtocompute!(#, *)inawaysuchthatAlicelearnsnothingabout*andBoblearnsnothingabout#otherthantheoutputvalue!(#, *).Ingeneral,expressingafunctionintermsofBooleanoperationsgreatlyincreasesthecomputationalcostofevaluatingthefunction.TomaximizetheefficiencyofYao’sprotocol,itisimportanttochoosefunctionalitieswithsimpleorcompactrepresentationsasBooleancircuits.AnexampleofaBooleancircuitisshowninSupplementaryFigure3.

Inthiswork,wecastdiagnosingMendelianpatientsas(simple)arithmetic/logiccomputationsthatadmitefficientBooleancircuitrepresentations.Wenowdescribehowthesecurecomputationprotocolswork.Todothiswefirstintroducetwostandardtoolsfromcryptography:(1)symmetric(secret-key)encryption32and(2)oblivioustransfer33–35.

EncryptionanddecryptionAsecret-keyencryptionschemeconsistsoftwofunctions:EncryptandDecrypt.Theencryptionfunctiontakesacryptographickeykandamessagemandoutputsaciphertext4.Thedecryptionfunctiontakesthecryptographickey5andaciphertext4andoutputsamessage6.Intuitively,encryptionanddecryptionareinverseoperations:ifweencryptamessageunderakey5,decryptingtheresultingciphertextwiththesamekey5recoverstheoriginalmessage.Moreprecisely,wecansaythatforanykey5andanymessage6,Decrypt 5, Encrypt 5,6 = 6.Inasymmetric(orsecret-key)encryptionscheme,boththeencryptionandthedecryptionfunctionsrequireknowledgeofthesecretcryptographickey.Thekeyisarandomstringdrawnfromsomekey-space.Theprecisenatureofthekey-spacevariesdependingonthedetailsoftheencryptionscheme,andisimmaterialtoourpresentationinthispaper.Anencryptionschemeisconsideredtobesecureiftheciphertextdoesnotrevealanyinformationabouttheunderlyingmessagetoanyuserwhodoesnotpossessthesecretencryptionkey(certainly,auserwhoholdsthesecretkeycandecryptandlearnthemessage).Oneway

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 8: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

8

toformalizethisistosaythatauserwhodoesnothavetheencryptionkeyisunabletotellanencryptionofamessage68apartfromanencryptionofanothermessage6$.Inotherwords,ciphertextshideallinformationabouttheirunderlyingmessagetoalluserswhodonothavetheencryptionkey.WeillustratethisinSupplementaryFigure2.

Underthisdefinition,messagescanalsobeencryptedmultipletimes.Forinstance,amessage6canbe“doubleencrypted”undertwokeys5$and5&byfirstencrypting6using5$andthenencryptingtheresultingciphertextusingthesecondkey5&.Thisprocedureyieldsanotherciphertext.Decryptionproceedsbyfirstdecryptingwithkey5&,andthendecryptingtheresult(aciphertext)with5$.Inparticular,wecanwrite

Decrypt 5$, Decrypt 5&, Encrypt 5&, Encrypt 5$,6 = 6

Securityofthedoubleencryptionschemefollowsdirectlyfromthesecurityoftheunderlyingencryptionscheme.Inparticular,auserwhodoesnothaveboth5$and5&cannotlearnanyinformationabouttheunderlyingmessagethathasbeendoublyencryptedusing5$and5&.Numeroussymmetric(secret-key)encryptionschemesexistintheliterature32.

OblivioustransferAnoblivioustransfer(OT)protocol33–35isatwo-partyprotocolbetweenasenderandareceiver.AnOTprotocolenablesthereceivertoselectivelyobtainoneoftwopossiblemessagesfromthesenderwithoutrevealingtothesenderwhichmessagethereceiverrequested.Moreprecisely,thesenderholdstwomessages,denoted58and5$andthereceiverholdsaselectionbit: ∈ 0,1 .AttheendoftheOTprotocol,thereceiverobtainsthechosenmessage5<andlearnsnothingabouttheothermessage5$=<.Thesenderdoesnotlearnanythingaboutthereceiver’schoicebit:.Numerousoblivioustransferprotocolshavebeenproposedintheliterature33–35.

OverviewofstepsforsecurecomputationInasecuretwo-partycomputationprotocol,Aliceholdsaninput# ∈ 0,1 >andBobholdsaninput* ∈0,1 >.Wewrite 0,1 >todenoteabinaryinputoflength?(e.g.,forinstance,?couldbeofthebinaryrepresentationofthevariantvectororthegenevectorwedefineinourmaintext).Theirgoalistocomputeafunction!(#, *)ontheirjointinput #, * .Thecomputationisconsidered“secure”ifattheendofthecomputation,theonlyinformationthatAliceandBoblearnisthefunctionvalue!(#, *)andnothingelseabouttheotherparty’sinput.Itisimportanttonoteherethatthefunctionoutput!(#, *)couldrevealsomeinformationabouttheinputs#and*(forexample,inourtrioscenario,whateverdenovovariantwereportinthechild,wecandeducebydefinitiondoesnotexistineitherparent).AswenoteintheDiscussionsection,weworkinthehonest-but-curiousmodelwhereweassumethatAliceandBobfollowtheprotocolspecificationasdirected,butmay,attheendoftheprotocolexecution,trytoinfersomeadditionalinformationabouteachother’sprivateinput.WenowdescribehowYao’sprotocolcanbeusedtosecurelyevaluateanyfunctionovertwoinputsinthehonest-but-curiousmodel.

ToapplyYao’sprotocol,itisfirstnecessarytorepresentthefunction!asaBooleancircuitoninputs#and*.Atthemostbasiclevel,thebuildingblockswehaveareAND(xANDy=Trueonlyifx=y=True,otherwiseitisFalse)andXOR(exclusive-or,xXORy=Trueonlyifx=Trueandy=False,orifx=Falseandy=True,otherwiseitisFalse)gates.Thesebasicgatescanbecombinedtoobtaincircuitsofarbitraryexpressivefunctionalities17.Inourdescriptionbelow,wewilloftentimesrefertotheconcreteexampleofsecurelyevaluatingtheANDfunctiononsingle-bitinputs(above).AvisualizationofthecompleteprotocolisgiveninFigure1.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 9: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

9

OverviewofYao’ssecuretwopartycomputationprotocolWenowdescribeYao’sprotocol.Foreaseofpresentation,wepresentasimplified(butlessefficient)descriptionofYao’sprotocolhere.Ourimplementation(basedontheJustGarble36library)followsthehigh-levelblueprintdescribedhere,butincludesseveraloptimizations,notablythefree-XOR37andhalf-gate38optimizations.Step1:Foreachwireinthecircuit,Alicechoosestwokeys.RecallthatinaBooleancircuit,eachwireinthecircuitcantakeontwopossiblevalues(0or1;sometimesalsoreferredtoasFalseandTrue,respectively).Aliceassociatesoneofthekeyswiththewirevalue0andanotherkeywiththewire1.FortheparticularcaseofsecurelyevaluatingtheANDfunction@ = # ∧ *,Alicepicksthreepairsofkeysandassociatesonepairwitheachof#,*,[email protected], 5B$, 5C8, 5C$, 5D8, 5D$.Inthisexample,5B8isthekeyassociatedwiththeinputbit#takingonthevalue0and5D$isthekeyassociatedwiththe

[email protected]:Foreachgateinthecircuit,Aliceconstructsa“garbled”truthtable.Foreachrowinthetruthtable,thealgorithmtakesthekeyassociatedwiththevalueoftheoutputwireanddoubleencryptsitusingthetwokeysassociatedwiththevaluesofthetwoinputwires.FortheparticularcaseofevaluatingasingleANDgate,Alicewouldconstructthefollowingtableofciphertexts

• E$ = EncryptFGH(EncryptFIH 5D8 )

• E& = EncryptFGH(EncryptFIJ 5D8 )

• EK = EncryptFGJ(EncryptFIH 5D8 )

• EL = EncryptFGJ(EncryptFIJ 5D$ )

Fortheoutputwiresofthecircuit,insteadofdoubleencryptinganencryptionkey,Alicedirectlydoubleencryptsthevalueoftheoutputwire(e.g.,0or1).ThisstepisshowninFigure1.2.Step3:Aftergarblingthecircuit(Steps1-2/Figure1.1-1.2),thesecurecomputationbeginswithBobusingtheoblivioustransferprotocol(above)toobtainthekeysfortheinputwiresassociatedwithhisinput.Theoblivioustransferprotocolensuresthefollowing:Bobonlylearnsoneofthetwokeysassociatedwitheachofhisinputwires(thiswillensurethatBobcanonlyevaluatethefunctiononasinglesetofinputs),andAlicedoesnotlearnwhichwireBobrequested(thatis,AlicedoesnotlearnBob’sinput).FortheparticularcaseofevaluatingasingleANDgate,ifBob’sinputis: ∈ {0,1},thenBobwouldplaytheroleofthereceiverinanOTprotocolwithinput:.Alicewouldplaytheroleofthesenderwithmessages5C8,5C$ (thekeysassociatedwithBob’sinputwire).Attheendoftheoblivioustransferprotocol,Bobobtains5C<(thekeyassociatedwithhisinput),andlearnsnothingaboutthekeyassociatedwiththecomplementofhisinput(5C$=<).AlicelearnsnothingaboutBob’sinput:.ThisstepisshowninFigure1.3.Step4:AfterBobreceivesthekeysassociatedwithhisinputviatheoblivioustransferprotocol,AlicesendsBobthegarbledtablesassociatedwitheachgate(afterrandomlypermutingtherowsofeachtable).Additionally,AlicesendsBobthewireencodingsofherinput.FortheparticularcaseofevaluatingasingleANDgate,ifAlice’sinputisO ∈ 0,1 ,Alicewouldsend5BP(thekeyassociatedwith# = O)toBob.ThisstepisalsoshowninFigure1.4.Step5:Withallthisinformation,Bobcancompletethefunctionevaluationandcomputetheoutput.Inparticular,afterSteps3and4,BobshouldhaveasinglekeyforeachoftheinputwiresoftheBooleancircuit.Then,foreachgateinthecircuit,Bobtakestheinputkeyshehasandattemptstodecrypttherowsinthegarbledtableassociatedwiththatgate.Becausetheentriesinthegarbledtablearedouble

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 10: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

10

encryptedusingthekeysassociatedwiththeinputwirestothegateandBobonlyhasasinglekeyforeachofthewires,BobisonlyabletodecryptasinglerowinthegarbledtableasshowninFigure1.5.Step6:Indoingso,Bobisabletolearnoneofthekeysassociatedwiththegate’soutputwire(moreover,byconstructionofthegarbledtable,theoutputkeyBobobtainsispreciselytheoneassociatedwiththevaluecorrespondingtoevaluationofthegateontheinputbits).Thus,startingwiththeinputwires,Bobisabletoevaluatethecircuitgate-by-gateasseeninFigure1.6.OnceBobreachestheoutputlayerofthecircuit,heisabletodecrypttheciphertextsandobtainthevalueofeachoutputwire.Summary.Tosummarize,inYao’ssecuretwo-partycomputationprotocol,AlicebeginsbyconstructingagarbledtruthtableforeachgateintheBooleancircuit.Shedoessobydoubleencryptingeachrowinthetruthtable(usingthekeysassociatedwiththeinputbits).ShegivesBobthegarbledtruthtablesaswellasthekeysassociatedwithherinput.Usingoblivioustransfer,Bobobtainsthekeysassociatedwithhisinput.Armedwithasinglekeyforeachoftheinputwiresinthecircuit,Bobisabletoevaluatethegarbledcircuitgate-by-gate.Foreachgate,Bobtakeshisinputkeysandusesthemtodecryptoneoftherowsofthegarbledtableassociatedwiththegate.Thisyieldsthekeyassociatedwiththeparticularwire.Finally,attheendofthecomputation,Bobdecryptstheciphertextsassociatedwiththeoutputwirestolearntheoutputofthecircuit.BobthensendstheresultofthecomputationtoAlice.

ExtendingYao’sprotocoltoNpartiesYao’sprotocolallowstwopartiestosecurelyevaluateanarbitraryfunction.However,ingeneral,wedesiretocomputeacrossalargenumberofparties(e.g.,studyparticipants).Whiletherearesecuremultipartycomputationprotocolsthatsupportmorethantwoparties,(e.g.,theSPDZ39,GMW31,orBGW40protocols),akeylimitationoftheseprotocolsisthattheyrequireallparticipatingpartiestobeonlineduringtheprotocolexecution.Moreover,thenumberofroundsofcommunicationintheprotocoloftengrowswiththecomplexityofthecomputation(notethatthisisindirectcontrastwithYao’sprotocolwhichisatwo-roundprotocol,regardlessofhowcomplicatedthecomputationis).Asaresult,therearesubstantialengineeringhurdlestodeployingthesegeneralprotocolsformultipartycomputationacrossalargenumberofparties.Insomecases(e.g.,BGW40andGMW31),thetotalbandwidthalsoscalesquadraticallyinthenumberofparties,furtherlimitingthepracticalityoftheseprotocols.

Amoreefficientsolutionforgeneralmultipartycomputationthatavoidsboththerequirementthatparticipatingpartiesbeonlineduringtheprotocolexecutionaswellasthepotentialcommunicationblowupistoworkina“two-cloud”model.Inthismodel,weassumetherearetwonon-colludingcloudserversthatfacilitatetheprotocolexecution.Atthebeginningoftheprotocolexecution,eachoftheparticipatingparties“split”theirinputsandshareitwiththetwocloudservers.Aslongasthetwocloudsdonotcolludewitheachother,theydonotlearnanythingabouttheinputstothecomputation.Afterthetwocloudservershavereceivedtheinputsfromeachoftheparticipatingparties,theyengageinatwo-partysecurecomputationprotocol(suchasYao’sprotocol)tocomputethefunctionofinterest.Notably,thepartiesthatcontributedthedatadonothavetobeonlineduringthisstepoftheprotocol.Andmoreover,communicationisonlynecessarybetweenthepartiesandthecloudservers;partiesinparticulardonothavetocommunicatewitheachother.Inapracticaldeployment,thesetwocloudserversmightbemanagedbydistinctgovernmentalorganizationswithintheNIHorWHO.Thus,byworkinginthetwo-cloudmodel,itispossibletotransformanycomputationbetween?individualsintoasecuretwo-partycomputationbetweentwonon-colludingparties.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 11: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

11

Step7:Wenowdescribehowtosecureevaluateanyfunctionalityinthetwo-cloudmodel.Supposethereare?partiesparticipatingintheprotocolexecutionandlet#$, … , #>denotetheirprivateinputs.Tosecurecomputeafunction!,eachofthe?participantschoosesarandomvalueRS andsendsRS tooneofthetwocloudservers.Theythensendtotheothercloudserverthevalue#S − RS (notethatthesubtractionisperformedmoduloalargeintegerU).OnceeverypartyhassubmittedtheirinputsRS and#S − RS tothetwocloudservers,thefirstcloudserverhasavectorofrandomvaluesO = (R$, … , R>)andthesecondcloudserverhasavectorofrandomdifferences: = (#$ − R$, … , #>– R>).BecausethesubtractionistakingplacemoduloU,thevaluesin:aredistributeduniformlyandmoreimportantly,independentlyofthe#S’s.Thepair(RS, #S − RS)isoftenreferredtoasan“additivesecretsharing”oftheinput#S.Thepropertythatthisadditivesecretsharingschemesatisfiesisthatasinglesharerevealsnoinformationabouttheinput,buttwosharescompletelydefinetheinput.Thismeansthataslongasthetwocloudserversdonotcollude,theylearnnoinformationabouteachparty’sinput(sincetheyeachpossessjustoneshareofthesecret).

Tocompletethesecurecomputation(ofafunction!),thetwocloudserverssimplyapplyYao’sprotocoltothefollowingtwo-partyfunctionality:

W R$, … , R> , #$ − R$, … , #>– R> = ! R$ + #$ − R$, … , R> + #>– R> = ! #$, … , #> .Inotherwords,thetwocloudscomputethefunctionalitythattakesasinputtwovectors(eachcontaining?values)andoutputsthefunction!evaluatedonthecomponentscorrespondingtothesumofthetwoinputvectors.Sincesummingtheinputvectorsinthiscasereconstructseachparty’sinput,thisprocedurecorrespondspreciselytoevaluating!ontheparties’inputs.Moreover,thetwocloudserversdonotlearnanyadditionalinformationaboutanyparticularparty’sinputbecausetheevaluationofWisperformedusingYao’sprotocol(whichisasecuretwo-partycomputationprotocol).ThisprocedureisshowninFigure1.7.

ConstructingourBooleancircuitsAsdescribedabove,arbitraryBooleancircuitscanbeconstructedusingonlyANDandXORgates.Toefficientlyrepresentourset-intersection-basedalgorithmsasBooleancircuits,wefirstconstructsomeintermediatebuildingblocksfromthebasicANDandXORgates.Theintermediatebuildingblockswerequireincludeadditioncircuits,comparisoncircuits,andequalitycircuits.Forthesebuildingblocks,weusethecircuitsbyKolesnikovetal.41(seeSupplementaryFigure5).

SoftwareimplementationInourimplementation,weusetheJustGarblelibrary36forourimplementationofYao’sgarbledcircuits,andweusetheAsharovetal.42implementationoftheoblivioustransferprotocols.Forbetterperformance,wealsoimplementthehalf-gatesoptimization38forYao’sgarbledcircuits.Thisimplementationwillbereleaseduponpublication.Forourbenchmarks,wesetupaclientandserveronAmazonEC2(tosimulatethetwocloudproviders),andmeasurethetotalcomputetime,bandwidth,andoverallprotocolexecutiontime(takingintoaccountthenetworkcommunication).Werunourexperimentsontwomemory-optimizedEC2instances(M4.2xlarge).Eachinstancerunsan8-core2.4GHzIntelXeonE5-2676v3(Haswell)processorandhas32GBofmemory.Whileourprotocolsarenaturallyparallelizable,weuseasinglethreadofexecutioninallofourexperiments,anddonottakeadvantageoftheavailableparallelism.Tosimulatethenon-colludingtwocloudmodel,weusedawide-areanetwork(WAN)settingwherethetwoserversarefarapart.WeplacedoneoftheserversontheWestCoast(specifically,intheNorthernCaliforniaavailabilityzone)andtheotherontheEastCoast(specifically,intheNorthernVirginiaavailabilityzone).

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 12: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

12

References1. Yang,Y.etal.ClinicalWhole-ExomeSequencingfortheDiagnosisofMendelianDisorders.N.Engl.J.

Med.369,1502–1511(2013).

2. Iglesias,A.etal.Theusefulnessofwhole-exomesequencinginroutineclinicalpractice.Genet.Med.

16,922–931(2014).

3. LeeH,DeignanJL,DorraniN&etal.CLinicalexomesequencingforgeneticidentificationofrare

mendeliandisorders.JAMA312,1880–1887(2014).

4. Rehm,H.L.etal.ACMGclinicallaboratorystandardsfornext-generationsequencing.Genet.Med.

Off.J.Am.Coll.Med.Genet.15,733–747(2013).

5. Lek,M.etal.Analysisofprotein-codinggeneticvariationin60,706humans.Nature536,285–291

(2016).

6. Ng,S.B.etal.Targetedcaptureandmassivelyparallelsequencingof12humanexomes.Nature

461,272–276(2009).

7. Ng,S.B.etal.Exomesequencingidentifiesthecauseofamendeliandisorder.Nat.Genet.42,30–35

(2010).

8. Ng,S.B.etal.ExomesequencingidentifiesMLL2mutationsasacauseofKabukisyndrome.Nat.

Genet.42,790–793(2010).

9. Moreno-Estrada,A.etal.ThegeneticsofMexicorecapitulatesNativeAmericansubstructureand

affectsbiomedicaltraits.Science344,1280–1285(2014).

10. Mailman,M.D.etal.TheNCBIdbGaPdatabaseofgenotypesandphenotypes.Nat.Genet.39,

1181–1186(2007).

11. Siu,L.L.etal.Facilitatingacultureofresponsibleandeffectivesharingofcancergenomedata.Nat.

Med.22,464–471(2016).

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 13: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

13

12. Shringarpure,S.S.&Bustamante,C.D.PrivacyRisksfromGenomicData-SharingBeacons.Am.J.

Hum.Genet.97,631–646(2015).

13. Regalado,A.NetworksofGenomeDataWillTransformMedicine.MITTechnologyReviewAvailable

at:https://www.technologyreview.com/s/535016/internet-of-dna/.(Accessed:30thSeptember

2016)

14. Wagner,J.,Paulson,J.N.,Wang,X.,Bhattacharjee,B.&CorradaBravo,H.Privacy-preserving

microbiomeanalysisusingsecurecomputation.Bioinformatics32,1873–1879(2016).

15. Simmons,S.,Sahinalp,C.&Berger,B.EnablingPrivacy-PreservingGWASsinHeterogeneousHuman

Populations.CellSyst.3,54–61(2016).

16. Popic,V.&Batzoglou,S.Privacy-PreservingReadMappingUsingLocalitySensitiveHashingand

SecureKmerVoting.bioRxiv046920(2016).doi:10.1101/046920

17. Lindell,Y.&Pinkas,B.SecureMultipartyComputationforPrivacy-PreservingDataMining.J.Priv.

Confidentiality1,59–98(2009).

18. Yao,A.C.-C.ProtocolsforSecureComputations.inAnnualSymposiumonFoundationsofComputer

Science160–164(1982).

19. Simpson,M.A.etal.MutationsinNOTCH2causeHajdu-Cheneysyndrome,adisorderofsevereand

progressiveboneloss.Nat.Genet.43,303–305(2011).

20. Rivière,J.-B.etal.DenovomutationsintheactingenesACTBandACTG1causeBaraitser-Winter

syndrome.Nat.Genet.44,440–444,S1-2(2012).

21. McBride,K.L.etal.NOTCH1mutationsinindividualswithleftventricularoutflowtract

malformationsreduceligand-inducedsignaling.Hum.Mol.Genet.17,2886–2893(2008).

22. Fenske,S.etal.HCN3contributestotheventricularactionpotentialwaveforminthemurineheart.

Circ.Res.109,1015–1023(2011).

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 14: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

14

23. Lindell,Y.&Pinkas,B.AProofofSecurityofYao’sProtocolforTwo-PartyComputation.JCryptol.22,

161–188(2009).

24. Hazay,C.&Lindell,Y.EfficientSecureTwo-PartyProtocols-TechniquesandConstructions.(Springer,

2010).

25. Dwork,C.DifferentialPrivacy.inICALP1–12(2006).

26. Dinur,I.&Nissim,K.Revealinginformationwhilepreservingprivacy.inPODS202–210(2003).

27. Li,H.&Durbin,R.Fastandaccuratelong-readalignmentwithBurrows-Wheelertransform.

Bioinforma.Oxf.Engl.26,589–595(2010).

28. McKenna,A.etal.TheGenomeAnalysisToolkit:aMapReduceframeworkforanalyzingnext-

generationDNAsequencingdata.GenomeRes.20,1297–1303(2010).

29. Wang,K.,Li,M.&Hakonarson,H.ANNOVAR:functionalannotationofgeneticvariantsfromhigh-

throughputsequencingdata.NucleicAcidsRes.38,e164–e164(2010).

30. Cunningham,F.etal.Ensembl2015.NucleicAcidsRes.43,D662-669(2015).

31. Goldreich,O.,Micali,S.&Wigderson,A.HowtoPlayanyMentalGameorACompletenessTheorem

forProtocolswithHonestMajority.inAnnualACMSymposiumonTheoryofComputing218–229

(1987).

32. Katz,J.&Lindell,Y.IntroductiontoModernCryptography.(ChapmanandHall/CRCPress,2007).

33. Rabin,M.O.HowToExchangeSecretswithObliviousTransfer.IACRCryptol.EPrintArch.2005,187

(2005).

34. Kilian,J.FoundingCryptographyonObliviousTransfer.inProceedingsofthe20thAnnualACM

SymposiumonTheoryofComputing,May2-4,1988,Chicago,Illinois,USA20–31(1988).

35. Naor,M.&Pinkas,B.ObliviousTransferandPolynomialEvaluation.inProceedingsoftheThirty-First

AnnualACMSymposiumonTheoryofComputing,May1-4,1999,Atlanta,Georgia,USA245–254

(1999).

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 15: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

15

36. Bellare,M.,Hoang,V.T.,Keelveedhi,S.&Rogaway,P.EfficientGarblingfromaFixed-Key

Blockcipher.inIEEESymposiumonSecurityandPrivacy478–492(2013).

37. Kolesnikov,V.&Schneider,T.ImprovedGarbledCircuit:FreeXORGatesandApplications.in

InternationalColloquiumonAutomata,LanguagesandProgramming486–498(2008).

38. Zahur,S.,Rosulek,M.&Evans,D.TwoHalvesMakeaWhole-ReducingDataTransferinGarbled

CircuitsUsingHalfGates.inEUROCRYPT220–250(2015).

39. Damgard,I.,Pastro,V.,Smart,N.P.&Zakarias,S.MultipartyComputationfromSomewhat

HomomorphicEncryption.inCRYPTO643–662(2012).

40. Ben-Or,M.,Goldwasser,S.&Wigderson,A.CompletenessTheoremsforNon-CryptographicFault-

TolerantDistributedComputation(ExtendedAbstract).inSTOC1–10(1988).

41. Kolesnikov,V.,Sadeghi,A.-R.&Schneider,T.ImprovedGarbledCircuitBuildingBlocksand

ApplicationstoAuctionsandComputingMinima.inCryptologyandNetworkSecurity1–20(2009).

42. Asharov,G.,Lindell,Y.,Schneider,T.&Zohner,M.Moreefficientoblivioustransferandextensions

forfastersecurecomputation.inACMCCS535–548(2013).

AuthorContributionsKJ,DW,DBandGBdesignedthestudy,analyzedresultsandwrotethemanuscript.KJandJBprocessedpatientdata.KJandDWwrotesoftwarefortheanalysis.

AcknowledgementsWethankDr.JonBernsteinandmembersoftheBonehandBejeranolabsforvaluablediscussionsandprojectfeedback.WealsothankStanfordpatientsandclinicians,aswellasthepatientsandprofessionalsinvolvedinthedepositionofthedbGaPsetsweuse.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 16: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

16

Figures/Tables

Figure1

Alice prepares:

2 types of big boxes + keys(for each value she may be holding)

2 types of small boxes + keys(for each value Bob may be holding)

2 types of notes(with the 2 possible outcomes)

The notes fit in the small boxes,that lock and fit into the big boxes,that also lock.

T F

BT BF

AT AF

Alice holds a secret value a:a = True or a = False.

Bob holds a secret value b:b = True or b= False.

Alice and Bob want to compute togetherf(a,b) = a AND bwithout Bob discovering anything about a,or Alice discovering anything about b.

0 1

Alice puts on the table the two keys (labeled BT and BF) she prepared for the small boxes and leaves the room:

Bob enters the room and picks up the appropriate key “Bb”: BT if his secret value b = True, BF if b = False. After picking up his key, he leaves the room. Alice then gives Bob all four unmarked big locked boxes. She also gives him one unmarked key “Aa”: AT if her secret value a = True, AF if a = False.

BT BF

Bob now holds 4 unmarked big locked boxes,A key from Alice Aa, and his own key Bb.

He tries to get the note from all four boxes,using Aa on the big boxes, and Bb on the small ones.

By design Bob can only reach a single note.

This note holds the correct answer for a AND b,that Alice and Bob set out to compute together.

Alice has learned nothing about Bob’s value b(she has left the room before Bob picked his key).

Bob has learned nothing about Alice’s value a(he received from Alice an unlabeled key).

3

4

5

a AND bAlice

Bob

a

b

Alice

Bob

.

.

.

.

.

.f(A,B)

Instead of providing ananswer, Alice providesthe correct unmarkedkey for the next step.

A

B

G1 G2 Gn

1 0 0 … 1PrivateGenome

0 0 1 … 1RandomNumber

1 0 1 … 0

1 1 1 … 0

1 1 0 … 0

0 0 1 … 0

Computer AWest Coast

Computer BEast Coast

R1R2 Rn

…G1-R1 G2-R2

Gn-Rn

f(A,B) = f((G1-R1,G2-R2,…,Gn-Rn),(R1,R2,…,Rn)) =f(G1-R1+R1, G1-R2+R2,…,Gn-Rn+Rn) =f(G1,G2,…Gn) = secure computation

with all n genomes

……

R1 R2 Rn

76

Aa

TAlice’s a Bob’s b f(a,b) = a AND b

T T T

T F F

F T F

F F F

T

F

F

F

BT AT

=

FBF AT

=

FBT AF

=

FBF AF

=

Alice prepares four big locked boxesmatching all four possible computations:

2

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 17: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

17

Table1Operation RelevantInformationforeachOperation RunningTimeMeasurements(a) MAX

(overgenes)Scenario:Smalldiseasecohort

#unrelatedprobands(whoavoid

openlysharingtheirdata)

Rarefunctionalvariants

(genes)perproband(median)

#probandswithrarefunctionalvariant/singene(top3,descending

order)

Genename Provencausalgenefordisease

Protectionquotient

(1-#ofvariantssharedoftopgene/total#ofvariants)

Bandwidth(GB)

Compute(sec)

Network(sec)

FreemanSheldonSyndrome

3 258(253)3 MYH3

MYH3 1-3/767=99.6% 0.02 .15 4.912 DBT

1 ACADVLHajdu-Cheney

Syndrome7 278(272)

6 NOTCH2NOTCH2 1-8/1853=

99.6% 0.03 .18 7.293 HLA-DRBI3 MCC

KabukiSyndrome 10 262(257)

8 KMT2DKMT2D 1–8/2754=

99.7% 0.04 .22 9.593 COL6A13 FLNB

MillerSyndrome 4 267(258)

4 DHODHDHODH 1–8/1063=

99.3% 0.03 .18 7.293 DNAH52 ACOX2

(b) SETDIFF(overvariants)

Scenario:familial

PatientID(avoidsharingwprovider)

#rarefunctionalvariants

#probandonlyvariants(revealed)

Genename Provencausalgene

Protectionquotient

Bandwidth(GB)

Compute(min)

Network(min)

Trio

115-f 185 N/R N/R

ACTB 1-2/524=99.6% 18.1 1.7 56.7115-m 164 N/R N/R

115-a1 175 2ACTBUSH2A

(c) INTERSECTION(overvariants)

Scenario:2Hospitals

#suspiciousvariants

(notshared)

Totalintersectingvariants(forpatientphenotypecomparisonfollow-up)

Protectionquotient Bandwidth

(GB)Compute(min)

Network(min)

Washington 5,734 159 1–318/11,067=97.1% 3.1 0.37 9.4

Baylor 5,333

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 18: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

18

SupplementaryFiguresandTables

SupplementaryFigure1

SupplementaryFigure2

A GenotypeData B

PositionArray

C

SmallCohort

gene

geneposition reference

alternate

GeneArray

… F F T F …

… 0 1 0 … 0 1 0 … 0 1 0 …

m f

a1

… 0 1 0 … 0 1 0 … 0 1 0 …

… 0 0 0 … 1 1 0 … 0 0 1 …

… 0 1 0 … 0 1 0 … 1 0 0 …

… 0 2 0 … 1 3 0 … 1 1 1 …

MAX

SmallFamily

… F F … T …

… F F … F …

… F T … T …

… F T … F …

TwoHospitals

… T F F T …

… F F T T …AND

Hospital1

Hospital2

… F F F T …

VectorRepresentation

m

f

a1

SETDIFF

INTERSECTION

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 19: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

19

SupplementaryFigure3

SupplementaryFigure4

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 20: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

20

SupplementaryFigure5

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 21: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

21

FigureandTableLegendsFigure1.Yao’sprotocolforsecuremultipartycomputation.Steps0-7describetheoverallsecureprotocolforcomputinganyfunctionFbetweentwoormorepartiesF(A,B,..,Y,Z).Wefirstdescribea

securetwo-partycomputationprotocolbetweenAlice(A)andBob(B).Step0:AliceandBobaretryingtocomputeajointfunctionwithoutrevealingtheirinputstotheotherparty.Step1:Alicecreatesakey/boxforeachpossiblevalueforeachinput(0or1).Step2:Alicedoublelocks(doubleencrypts)eachofthefourpossibleoutputsbyplacingtherelevantoutputnoteintwoboxescorrespondingtoeach

combinationofthetwoinputs.Step3:AlicegivesBobtheoptionofchoosingexactlyoneoftwopossiblekeys,labeledBTandBF.Step4:BobpicksupexactlyonekeyBbwherebcorrespondstohis

hiddeninputwhichonlyheknows(theoblivioustransferprotocolensuresthatBobcanonlypickuponekey).AfterBobmakeshisselection,Aliceshufflesthedoubly-lockedboxesandhandsthemtoBobalong

withthekeyAacorrespondingtoherinputa.Steps1-4isrepeatedforeachoftheinputstothefunction.

Step5:Foreachoperatorinthefunctionthatdependsonlyoninputvalues(i.e.,thefirst“layer”ofthecircuit),Bobhasfourdoubly-lockedboxesandtwokeysAaandBbbuthedoesnotknowAlice’sinput

andAlicedoesnotknowBob’sinput.HeusesAaandBbandtriestounlockallfourboxes.Onlyoneof

thefourdoubly-lockedboxeswillsuccessfullyopen,revealingthejointoutputwithoutrevealingAlice’s

orBob’sinputs.Step6:Therevealedoutputyieldsthekeyforthenextoperation(gate)inthecircuit.Steps5and6arerepeatedforeachoperationinthefunction.Attheendofthecomputation,insteadof

keys,Bobobtainsthevaluesthatmakeuptheoutputofthecomputation.Step7:Thissecuretwo-partycomputationprocesscanbeexpandedtoNpartiesbyusingadditivesecretsharingbetweentwonon-

colludingcloudservers.TheN-inputfunctionisthustransformedintoatwo-inputfunction.

Table1.Summaryofresultsfordifferentsecuregenomicmultipartycomputationscenarios,allusingrealpatientdata.

SupplementaryFigure1.Representinggenomicdataasvectorsforsecurecomputation.(A)Eachindividualholdstheirpersonalgenomeprivate.(B)Theyareaskedtofillinapositionarray/vectorwith

TrueandFalsevaluesdependingonwhethertheyhaveararefunctionalvariantatthelistedposition,or

a0/1valueinagenearraydependingonwhethertheyhavenone/somerarefunctionalvariant/sin

eachlistedgene.(C)Theresultingposition/genevectorsareusedtoobtaintheresultsofTable1.

SupplementaryFigure2.Encryptionanddecryptionoverview.Asecret-keyencryptionschemeconsists

ofthreealgorithms:(A)asetupalgorithmwhichoutputsasecretkey(usuallyalongrandomstring);(B)

anencryptionalgorithmthattakesinasecretkey!andamessage"andproducesanencryptionof"

(calledaciphertext);and(C)adecryptionalgorithmthattakesinthesamesecretkey!andaciphertextandproducestheoriginalmessage.WewriteEnc&(")todenoteanencryptionofthemessage"under

thesecretkey!.Thecorrectnessrequirementforanencryptionschemestatesthatdecryptingthe

ciphertextoutputbyEnc&(")usingthesecretkey!shouldyieldtheoriginalmessage(plaintext)".(D)

Thesecurityrequirementforasecret-keyencryptionschemestatesthatanyonewhodoesnotpossessthesecret-key!cannotdistinguishanencryptionofamessage")fromanencryptionofamessage"*,irrespectiveofthechoiceofmessages")and"*.Inotherwords,withoutthesecretkey,theciphertextdoesnotrevealanyinformationabouttheencryptedmessage.

SupplementaryFigure3.Computationusingcircuits.(A)ABooleancircuitconsistsofasequenceoflogicgates(e.g.,ANDgates,ORgates,andNOTgates).Eachlogicgatetakesoneortwobitsasinputand

producesasinglebitofoutput.InaBooleancircuit,theoutputsofonelogicgatecanbeusedasthe

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint

Page 22: Revealing the causative variant in Mendelian patient ...computation on encrypted data. These provide different tradeoffs in complexity and efficiency17. In this work, we use Yao’s

22

inputtoanotherlogicgate.Werefertothesevaluesastheintermediatevaluesinthecomputation.In

thecircuitdepictedinthefigure,theinputstothecircuitaredenoted+), +*, +-, +., +/andtheoutputsofthecircuitaredenoted0), 0*.Specifically,thisparticularcircuitimplementsafunctionoverfiveinput

bitsandproducestwooutputbits.(B)EachgateintheBooleancircuitcanbedescribedbyatruthtable

thatspecifiesthemappingbetweeneachconfigurationoftheinputbitstoacorrespondingoutputbit.

InthecaseofanANDgate,therearetwoinputbits,andtheoutputis1ifandonlyifbothinputbitsare1.Otherwise,theoutputis0.

SupplementaryFigure4.Performancescaleupforsecurecomputation.Bandwidth,compute(CPU)

time,andoverallprotocolexecution(wallclock)timeforthesecureMAX,SETDIFFandINTERSECTION

scenariosofTable1,usingasinglethreadontwoservers,onelocatedontheEastCoastandtheother

ontheWestCoast.(A)Whenincreasingthenumberofunrelatedsubjectsinasmallcohortstudy,all

parametersgrowlogarithmically.(B)Whenincreasingthenumberoffamilymembersinanaffected/

nonaffectedscenario,parametersalsogrowlogarithmically.(C)Inthetwohospitalscenario,when

increasingthenumberofgenomicpositionsofpotentialinterest(e.g.,fromtheexometothenon-

codinggenome),allparametersgrowlinearly.Notethatallthreescenarios(A-C)performthebulkof

theircomputationoneachelementoftheinputvectorseparately(SupplementaryFigure1c).All

scenariosarethussimpletoparallelizeformaximumspeed-upusingmultiplethreadsandnodes.

SupplementaryFigure5.Booleanbuildingblocksforcomposingcomplexfunctions.(A)Thebasicbuildingblocksweusetobuildourcircuitsforidentifyingcommonmutationsandshareddenovo

variantsincludeaddition,comparison,equality,andmultiplexercircuits.AnadditioncircuitADD& on!bitinputstakestwo!-bitvaluesandoutputsthe!-bitrepresentationoftheirsum(additionis

performedmodulo2&).TheLT& andEQ& circuitsimplementtheless-thanandequalityoperations,

respectively,on!-bitinputs.TheMUX& circuitimplementsamultiplexercircuitwhichoninputsa

selectionbit: ∈ 0,1 andtwo!-bitvalues+>, +),outputs+?.TheindividualcircuitscanbeefficientlyconstructedusingANDgatesandXORgates,asdescribedbyKolesnikovetal

41.Thesebasiccircuit

buildingblockscanbecomposedtobuildamaxcircuitMAX& ontwoinputs(eachoflength!),whichinturncanbeusedtobuildamaxcircuiton@inputs.(B)Thiscircuitcomputestheargmaxover@additivelysecret-sharedvaluesA), … , AC.Thecircuitoperatesbyfirstcombiningthesharesandthen

takingthemaxovertheresultingvectorofvalues.Theargmaxisrepresentedbyabit-stringoflength@,whereapositionDhasvalue1ifAE isequaltothemaxvalue,and0otherwise.(C)Thiscircuitcomputes

thesetofgenes(representedbyindices)thatarepresentinatestvectorA), … , ACbutnotpresentinapoolF), … , FC.Thecountsofthemutationsappearinginthepoolareadditivelysecretshared.Thecircuit

firstcombinestheshares,andthenidentifiestheindicesDthatappearinthetestvector(AE = 1),butnotpresentinthepool(FE = 0).Thecircuitoutputs:E = 1ifthegeneindexedbyDoccursinthetargetgenomebutnotinthetestpool,and0otherwise.

.CC-BY-NC-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted January 27, 2017. ; https://doi.org/10.1101/103655doi: bioRxiv preprint