463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay...

34
463.9 Health Information Technology Computer Security II CS463/ECE424 University of Illinois

Transcript of 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay...

Page 1: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

463.9 Health Information Technology

Computer Security IICS463/ECE424

University of Illinois

Page 2: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Midterm– March 10 (Tue), during class, this room (SC 0216)– Closed book, closed note, no calculator– 12 multi-choice questions + 6 essay questions

(sample questions posted via Git)

• Tips– Focus on slides, watch videos (https://echo360.org/)– Practice with quiz questions and sample questions

Announcement

2

Page 3: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Privacy and genomic data

• Privacy protecting genomic research using– Private set intersection– SGX

Outline

3

Page 4: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Genome

• Contains all of the biological information needed to build and maintain a “living example” of an organism

• Encoded in DNA, one polymer of nucleotides– A,G,C,T

• Human Genome:– Approximately 3 billion nucleotides– Stored in 23 chromosome pairs (plus mtDNA)

4

[BaldiBDGT11]

Page 5: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Cost Per Genome

5

Page 6: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Better understanding of human genome• Many individuals have access to key parts of their

genomes• Precision medicine enabled• Testing possible not only in-vitro but also in-silico

New Frontiers

6

Page 7: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Genomic data carry sensitive information that may reveal– identity,– predisposition to diseases,– and even facial features.

• Disclosure may propagate the privacy risks to blood relatives.• Individuals have marked differences in the way they want their

data utilized for research.• Data are irrevocable once they are disseminated• New privacy threats may emerge over time with new discoveries

of human genetics and the advance of attack methods.– Many aggregate results have been removed from the public domain hosted

by NIH.

Privacy Concerns

7

Page 8: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Genetic ExceptionalismHow Special is Genomic Data?

Evans, James P., and Wylie Burke. "Genetic exceptionalism. Too much of a good thing?." Genetics in Medicine 10, no. 7 (2008): 500-501.

McGuire, Amy L., Rebecca Fisher, Paul Cusenza, Kathy Hudson, Mark A. Rothstein, Deven McGraw, Stephen Matteson, John Glaser, and Douglas E. Henley. "Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider." Genetics in Medicine 10, no. 7 (2008): 495-499.

[Naveed15]

Page 9: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Homomorphic encryption• Differential privacy• Secret sharing• Secure multi-party computation (MPC)

– Garbled circuits

• Secure two-party computation– Private Set Intersection (PSI)

• Trusted execution environments– SGX

PETs for Computation on Genomic Data

9

Page 10: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Strawman Approach for Paternity Test

• On average, ~99.5% of any two human genomes are identical

• Parents and children have even more similar genomes

• Compare candidate’s genome with that of the alleged child:– Test positive if % of matching

nucleotides is > 99.5 + τ

First-Attempt Privacy-Preserving Protocol

• Use an appropriate secure two-party protocol for the comparison

• PROs: High-accuracy and error resilience

• CONs: Performance not promising (3 billion symbols in input)– Experiments showed

computation takes a few days

Privacy-Preserving Genetic Paternity Test (1 of 2)

10[Baldi11]

Page 11: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Improved Protocol– ~99.5% of any two human genomes are

identical– Why don’t we compare only the

remaining 0.5%?

But… We don’t know (yet) where exactly these 0.5% occur!

Using Private Set Intersection Cardinality for privacy-preserving comparison, it takes about 1 hour

Privacy-Preserving Genetic Paternity Test (2 of 2)

11

Imagefromdna-testing-for-paternity.com

Page 12: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Private Set Intersection Cardinality (PSI-CA)

12

Server Client

Private Set Intersection Cardinality (PSI-CA)

S∩C⊥

S={s1,…,sw} C={c1,…,cw}

Page 13: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• In-vitro emulation – RFLP-based paternity test– Restriction Fragment Length Polymorphism (RFLP) analysis:

a difference between samples of homologous DNA molecules from differing locations of restriction enzyme sites

– DNA sample is cut into fragments by enzymes• Fragments separated according to their lengths by gel electrophoresis• Paternity test is positive if enough fragments have the same length

• RFLP-based PPGPT – Reduction to PSI-CA– Participants: “client” (receives the result), “server” (remains

oblivious)– Public input: , enzymes , markers– Private input: digitized genomes

PPGT Strategy

13

E = {e1,...,ej} M = {mk1,...,mkl}τ

Page 14: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Privacy-Preserving RFLP-based Paternity Test

14

Private Set Intersection Cardinality

Test Result(#fragments with same length)

Page 15: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Why compare fragment lengths?– Isn’t it more accurate to compare actual contents?– In reality, RFLP yields “false positives” with very low probability– This approach increases resilience to sequencing errors

• Performance Evaluation– About 1min pre-processing to emulate enzyme digestion process– About 10ms computation time on Intel Core i5 with 25 fragments– Less than 1s on a smartphone (Nokia N900, 600MHz CPU)– Extending to 50 fragments doubles computation time and increases

accuracy by orders of magnitudes– Communication overhead: only a few KBs

Remarks

15

Page 16: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

16

Page 17: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Kawasaki Disease, also known as KD or mucocutaneous lymph node syndrome, is a disease in which blood vessels throughout the body become inflamed.

• It is rare (about 1 in 1000 under age of 5).• Its cause is not well understood, but there seem

to be both genetic and environmental effects. • It can be serious and hard to treat.

Kawasaki Disease

17

Page 18: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Challenge to get enough subjects to provide statistical power for studies of a rare disease.– No studies on KD genomics for African Americans

• Sharing genomic data is complicated by privacy rules of many institutions and governments.

• Approach: PRINCESS framework for Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS

PRINCESS Study on Kawasaki

18[Chen16]

Page 19: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

PRINCESS Framework

19

Page 20: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Transmission Disequilibrium Test (TDT) is a family-based test for disease traits that uses the genotype information from both parents and a child.

• Used to test seventy two Kawasaki disease (KD) children and their biological parents from: – Rady Children’s Hospital San Diego (RCHSD) (N = 45), – Emory University (N = 21) in Atlanta, and– Imperial College in London (N = 6)

• Examined > 695,784 SNPs

Genome Analysis of KD Trios

20

Page 21: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

PRINCESS Security Architecture

21

Page 22: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

PRINCESS Data Management

22

Page 23: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Solutions based on SGX are not expected to introduce significant computational overhead or big restrictions on data analysis operations.

• By contrast, these are common to software-based techniques such as the SMC (garbled circuit) FlexSC framework and Homomorphic Encryption based HElib framework.

• SGX therefore makes secure large-scale, inter-continental, genetic analysis feasible in practice.

Efficiency Hypothesis

23

Page 24: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• Used simulated data to produce scale tests.• Used AWS nodes for multiple clients (up to 12).• Assorted conservative compromises were made

in measuring crypto analytics– For instance, HElib does not support division, so only

addition and multiplication operations were measured.

Performance Study

24

Page 25: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Performance Comparison

25

Page 26: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Performance Breakdown

26

Page 27: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Identified SNPS

27

Page 28: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

[BaldiBDGT11] Countering GATTACA: Efficient and Secure Testing of Fully-Sequenced Human Genomes. Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik, CCS 2011.

• [Naveed15] Privacy in the Genomic Era, Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and XiaoFeng Wang. ACM Computing Surveys 48, 1, Article 6, August, 2015.Associated online tutorial on genomics for computer scientists.

References

28

Page 29: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

The Genomic Data Chasm

Clinical• Tests for specialists seeking

genetic markers for diagnosis or treatment of specific conditions

• Whole Genome Sequencing (WGS) for– PCPs who want to identify high

likelihood concerns– Researchers– Subsequent “in silica” testing

Direct to Consumer (DTC)• Enables broad access at low

cost for diverse reasons• Examples: paternity testing

and genealogy studies• Controversial issues with

quality of results and their interpretation

• Disruptive influence on clinical testing

Page 30: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

General Practitioner Report versus Medical Geneticist Report

Design Approaches for the Display of Genetic Test Results, C Bushell, M Ferber, L Gatzke, K Johnson, V Jongeneel, K Schahl. Individualizing Medicine Conference, 2012.

Page 31: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Sample DTC Report on Genomic Susceptibility to Disease

23andme.com.

Page 32: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Details on Markers

23andme.com.

Page 33: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

Adventurous DTC Vendors

genepartner.com.

Page 34: 463.9 Health Information Technology · 2020. 3. 5. · –12 multi-choice questions + 6 essay questions (sample questions posted via Git) •Tips –Focus on slides, watch videos

• What security and privacy issues are raised by DTC genomics?

• How would you like to see your DNA data managed? What about the DNA of your relatives?

• Should it be legal to obtain your DNA without your consent?

Discussion

34