463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay...

Post on 21-Jan-2021

1 views 0 download

Transcript of 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay...

463.9 Health Information Technology

Computer Security IICS463/ECE424

University of Illinois

• Midterm– March 10 (Tue), during class, this room (SC 0216)– Closed book, closed note, no calculator– 12 multi-choice questions + 6 essay questions

(sample questions posted via Git)

• Tips– Focus on slides, watch videos (https://echo360.org/)– Practice with quiz questions and sample questions

Announcement

2

• Privacy and genomic data

• Privacy protecting genomic research using– Private set intersection– SGX

Outline

3

Genome

• Contains all of the biological information needed to build and maintain a “living example” of an organism

• Encoded in DNA, one polymer of nucleotides– A,G,C,T

• Human Genome:– Approximately 3 billion nucleotides– Stored in 23 chromosome pairs (plus mtDNA)

4

[BaldiBDGT11]

Cost Per Genome

5

• Better understanding of human genome• Many individuals have access to key parts of their

genomes• Precision medicine enabled• Testing possible not only in-vitro but also in-silico

New Frontiers

6

• Genomic data carry sensitive information that may reveal– identity,– predisposition to diseases,– and even facial features.

• Disclosure may propagate the privacy risks to blood relatives.• Individuals have marked differences in the way they want their

data utilized for research.• Data are irrevocable once they are disseminated• New privacy threats may emerge over time with new discoveries

of human genetics and the advance of attack methods.– Many aggregate results have been removed from the public domain hosted

by NIH.

Privacy Concerns

7

Genetic ExceptionalismHow Special is Genomic Data?

Evans, James P., and Wylie Burke. "Genetic exceptionalism. Too much of a good thing?." Genetics in Medicine 10, no. 7 (2008): 500-501.

McGuire, Amy L., Rebecca Fisher, Paul Cusenza, Kathy Hudson, Mark A. Rothstein, Deven McGraw, Stephen Matteson, John Glaser, and Douglas E. Henley. "Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider." Genetics in Medicine 10, no. 7 (2008): 495-499.

[Naveed15]

• Homomorphic encryption• Differential privacy• Secret sharing• Secure multi-party computation (MPC)

– Garbled circuits

• Secure two-party computation– Private Set Intersection (PSI)

• Trusted execution environments– SGX

PETs for Computation on Genomic Data

9

Strawman Approach for Paternity Test

• On average, ~99.5% of any two human genomes are identical

• Parents and children have even more similar genomes

• Compare candidate’s genome with that of the alleged child:– Test positive if % of matching

nucleotides is > 99.5 + τ

First-Attempt Privacy-Preserving Protocol

• Use an appropriate secure two-party protocol for the comparison

• PROs: High-accuracy and error resilience

• CONs: Performance not promising (3 billion symbols in input)– Experiments showed

computation takes a few days

Privacy-Preserving Genetic Paternity Test (1 of 2)

10[Baldi11]

• Improved Protocol– ~99.5% of any two human genomes are

identical– Why don’t we compare only the

remaining 0.5%?

But… We don’t know (yet) where exactly these 0.5% occur!

Using Private Set Intersection Cardinality for privacy-preserving comparison, it takes about 1 hour

Privacy-Preserving Genetic Paternity Test (2 of 2)

11

Imagefromdna-testing-for-paternity.com

Private Set Intersection Cardinality (PSI-CA)

12

Server Client

Private Set Intersection Cardinality (PSI-CA)

S∩C⊥

S={s1,…,sw} C={c1,…,cw}

• In-vitro emulation – RFLP-based paternity test– Restriction Fragment Length Polymorphism (RFLP) analysis:

a difference between samples of homologous DNA molecules from differing locations of restriction enzyme sites

– DNA sample is cut into fragments by enzymes• Fragments separated according to their lengths by gel electrophoresis• Paternity test is positive if enough fragments have the same length

• RFLP-based PPGPT – Reduction to PSI-CA– Participants: “client” (receives the result), “server” (remains

oblivious)– Public input: , enzymes , markers– Private input: digitized genomes

PPGT Strategy

13

E = {e1,...,ej} M = {mk1,...,mkl}τ

Privacy-Preserving RFLP-based Paternity Test

14

Private Set Intersection Cardinality

Test Result(#fragments with same length)

• Why compare fragment lengths?– Isn’t it more accurate to compare actual contents?– In reality, RFLP yields “false positives” with very low probability– This approach increases resilience to sequencing errors

• Performance Evaluation– About 1min pre-processing to emulate enzyme digestion process– About 10ms computation time on Intel Core i5 with 25 fragments– Less than 1s on a smartphone (Nokia N900, 600MHz CPU)– Extending to 50 fragments doubles computation time and increases

accuracy by orders of magnitudes– Communication overhead: only a few KBs

Remarks

15

16

• Kawasaki Disease, also known as KD or mucocutaneous lymph node syndrome, is a disease in which blood vessels throughout the body become inflamed.

• It is rare (about 1 in 1000 under age of 5).• Its cause is not well understood, but there seem

to be both genetic and environmental effects. • It can be serious and hard to treat.

Kawasaki Disease

17

• Challenge to get enough subjects to provide statistical power for studies of a rare disease.– No studies on KD genomics for African Americans

• Sharing genomic data is complicated by privacy rules of many institutions and governments.

• Approach: PRINCESS framework for Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS

PRINCESS Study on Kawasaki

18[Chen16]

PRINCESS Framework

19

• Transmission Disequilibrium Test (TDT) is a family-based test for disease traits that uses the genotype information from both parents and a child.

• Used to test seventy two Kawasaki disease (KD) children and their biological parents from: – Rady Children’s Hospital San Diego (RCHSD) (N = 45), – Emory University (N = 21) in Atlanta, and– Imperial College in London (N = 6)

• Examined > 695,784 SNPs

Genome Analysis of KD Trios

20

PRINCESS Security Architecture

21

PRINCESS Data Management

22

• Solutions based on SGX are not expected to introduce significant computational overhead or big restrictions on data analysis operations.

• By contrast, these are common to software-based techniques such as the SMC (garbled circuit) FlexSC framework and Homomorphic Encryption based HElib framework.

• SGX therefore makes secure large-scale, inter-continental, genetic analysis feasible in practice.

Efficiency Hypothesis

23

• Used simulated data to produce scale tests.• Used AWS nodes for multiple clients (up to 12).• Assorted conservative compromises were made

in measuring crypto analytics– For instance, HElib does not support division, so only

addition and multiplication operations were measured.

Performance Study

24

Performance Comparison

25

Performance Breakdown

26

Identified SNPS

27

[BaldiBDGT11] Countering GATTACA: Efficient and Secure Testing of Fully-Sequenced Human Genomes. Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik, CCS 2011.

• [Naveed15] Privacy in the Genomic Era, Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and XiaoFeng Wang. ACM Computing Surveys 48, 1, Article 6, August, 2015.Associated online tutorial on genomics for computer scientists.

References

28

The Genomic Data Chasm

Clinical• Tests for specialists seeking

genetic markers for diagnosis or treatment of specific conditions

• Whole Genome Sequencing (WGS) for– PCPs who want to identify high

likelihood concerns– Researchers– Subsequent “in silica” testing

Direct to Consumer (DTC)• Enables broad access at low

cost for diverse reasons• Examples: paternity testing

and genealogy studies• Controversial issues with

quality of results and their interpretation

• Disruptive influence on clinical testing

General Practitioner Report versus Medical Geneticist Report

Design Approaches for the Display of Genetic Test Results, C Bushell, M Ferber, L Gatzke, K Johnson, V Jongeneel, K Schahl. Individualizing Medicine Conference, 2012.

Sample DTC Report on Genomic Susceptibility to Disease

23andme.com.

Details on Markers

23andme.com.

Adventurous DTC Vendors

genepartner.com.

• What security and privacy issues are raised by DTC genomics?

• How would you like to see your DNA data managed? What about the DNA of your relatives?

• Should it be legal to obtain your DNA without your consent?

Discussion

34