Post on 21-Jan-2021
463.9 Health Information Technology
Computer Security IICS463/ECE424
University of Illinois
• Midterm– March 10 (Tue), during class, this room (SC 0216)– Closed book, closed note, no calculator– 12 multi-choice questions + 6 essay questions
(sample questions posted via Git)
• Tips– Focus on slides, watch videos (https://echo360.org/)– Practice with quiz questions and sample questions
Announcement
2
• Privacy and genomic data
• Privacy protecting genomic research using– Private set intersection– SGX
Outline
3
Genome
• Contains all of the biological information needed to build and maintain a “living example” of an organism
• Encoded in DNA, one polymer of nucleotides– A,G,C,T
• Human Genome:– Approximately 3 billion nucleotides– Stored in 23 chromosome pairs (plus mtDNA)
4
[BaldiBDGT11]
Cost Per Genome
5
• Better understanding of human genome• Many individuals have access to key parts of their
genomes• Precision medicine enabled• Testing possible not only in-vitro but also in-silico
New Frontiers
6
• Genomic data carry sensitive information that may reveal– identity,– predisposition to diseases,– and even facial features.
• Disclosure may propagate the privacy risks to blood relatives.• Individuals have marked differences in the way they want their
data utilized for research.• Data are irrevocable once they are disseminated• New privacy threats may emerge over time with new discoveries
of human genetics and the advance of attack methods.– Many aggregate results have been removed from the public domain hosted
by NIH.
Privacy Concerns
7
Genetic ExceptionalismHow Special is Genomic Data?
Evans, James P., and Wylie Burke. "Genetic exceptionalism. Too much of a good thing?." Genetics in Medicine 10, no. 7 (2008): 500-501.
McGuire, Amy L., Rebecca Fisher, Paul Cusenza, Kathy Hudson, Mark A. Rothstein, Deven McGraw, Stephen Matteson, John Glaser, and Douglas E. Henley. "Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider." Genetics in Medicine 10, no. 7 (2008): 495-499.
[Naveed15]
• Homomorphic encryption• Differential privacy• Secret sharing• Secure multi-party computation (MPC)
– Garbled circuits
• Secure two-party computation– Private Set Intersection (PSI)
• Trusted execution environments– SGX
PETs for Computation on Genomic Data
9
Strawman Approach for Paternity Test
• On average, ~99.5% of any two human genomes are identical
• Parents and children have even more similar genomes
• Compare candidate’s genome with that of the alleged child:– Test positive if % of matching
nucleotides is > 99.5 + τ
First-Attempt Privacy-Preserving Protocol
• Use an appropriate secure two-party protocol for the comparison
• PROs: High-accuracy and error resilience
• CONs: Performance not promising (3 billion symbols in input)– Experiments showed
computation takes a few days
Privacy-Preserving Genetic Paternity Test (1 of 2)
10[Baldi11]
• Improved Protocol– ~99.5% of any two human genomes are
identical– Why don’t we compare only the
remaining 0.5%?
But… We don’t know (yet) where exactly these 0.5% occur!
Using Private Set Intersection Cardinality for privacy-preserving comparison, it takes about 1 hour
Privacy-Preserving Genetic Paternity Test (2 of 2)
11
Imagefromdna-testing-for-paternity.com
Private Set Intersection Cardinality (PSI-CA)
12
Server Client
Private Set Intersection Cardinality (PSI-CA)
S∩C⊥
S={s1,…,sw} C={c1,…,cw}
• In-vitro emulation – RFLP-based paternity test– Restriction Fragment Length Polymorphism (RFLP) analysis:
a difference between samples of homologous DNA molecules from differing locations of restriction enzyme sites
– DNA sample is cut into fragments by enzymes• Fragments separated according to their lengths by gel electrophoresis• Paternity test is positive if enough fragments have the same length
• RFLP-based PPGPT – Reduction to PSI-CA– Participants: “client” (receives the result), “server” (remains
oblivious)– Public input: , enzymes , markers– Private input: digitized genomes
PPGT Strategy
13
E = {e1,...,ej} M = {mk1,...,mkl}τ
Privacy-Preserving RFLP-based Paternity Test
14
Private Set Intersection Cardinality
Test Result(#fragments with same length)
• Why compare fragment lengths?– Isn’t it more accurate to compare actual contents?– In reality, RFLP yields “false positives” with very low probability– This approach increases resilience to sequencing errors
• Performance Evaluation– About 1min pre-processing to emulate enzyme digestion process– About 10ms computation time on Intel Core i5 with 25 fragments– Less than 1s on a smartphone (Nokia N900, 600MHz CPU)– Extending to 50 fragments doubles computation time and increases
accuracy by orders of magnitudes– Communication overhead: only a few KBs
Remarks
15
16
• Kawasaki Disease, also known as KD or mucocutaneous lymph node syndrome, is a disease in which blood vessels throughout the body become inflamed.
• It is rare (about 1 in 1000 under age of 5).• Its cause is not well understood, but there seem
to be both genetic and environmental effects. • It can be serious and hard to treat.
Kawasaki Disease
17
• Challenge to get enough subjects to provide statistical power for studies of a rare disease.– No studies on KD genomics for African Americans
• Sharing genomic data is complicated by privacy rules of many institutions and governments.
• Approach: PRINCESS framework for Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS
PRINCESS Study on Kawasaki
18[Chen16]
PRINCESS Framework
19
• Transmission Disequilibrium Test (TDT) is a family-based test for disease traits that uses the genotype information from both parents and a child.
• Used to test seventy two Kawasaki disease (KD) children and their biological parents from: – Rady Children’s Hospital San Diego (RCHSD) (N = 45), – Emory University (N = 21) in Atlanta, and– Imperial College in London (N = 6)
• Examined > 695,784 SNPs
Genome Analysis of KD Trios
20
PRINCESS Security Architecture
21
PRINCESS Data Management
22
• Solutions based on SGX are not expected to introduce significant computational overhead or big restrictions on data analysis operations.
• By contrast, these are common to software-based techniques such as the SMC (garbled circuit) FlexSC framework and Homomorphic Encryption based HElib framework.
• SGX therefore makes secure large-scale, inter-continental, genetic analysis feasible in practice.
Efficiency Hypothesis
23
• Used simulated data to produce scale tests.• Used AWS nodes for multiple clients (up to 12).• Assorted conservative compromises were made
in measuring crypto analytics– For instance, HElib does not support division, so only
addition and multiplication operations were measured.
Performance Study
24
Performance Comparison
25
Performance Breakdown
26
Identified SNPS
27
[BaldiBDGT11] Countering GATTACA: Efficient and Secure Testing of Fully-Sequenced Human Genomes. Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik, CCS 2011.
• [Naveed15] Privacy in the Genomic Era, Muhammad Naveed, Erman Ayday, Ellen W. Clayton, Jacques Fellay, Carl A. Gunter, Jean-Pierre Hubaux, Bradley A. Malin, and XiaoFeng Wang. ACM Computing Surveys 48, 1, Article 6, August, 2015.Associated online tutorial on genomics for computer scientists.
References
28
The Genomic Data Chasm
Clinical• Tests for specialists seeking
genetic markers for diagnosis or treatment of specific conditions
• Whole Genome Sequencing (WGS) for– PCPs who want to identify high
likelihood concerns– Researchers– Subsequent “in silica” testing
Direct to Consumer (DTC)• Enables broad access at low
cost for diverse reasons• Examples: paternity testing
and genealogy studies• Controversial issues with
quality of results and their interpretation
• Disruptive influence on clinical testing
General Practitioner Report versus Medical Geneticist Report
Design Approaches for the Display of Genetic Test Results, C Bushell, M Ferber, L Gatzke, K Johnson, V Jongeneel, K Schahl. Individualizing Medicine Conference, 2012.
Sample DTC Report on Genomic Susceptibility to Disease
23andme.com.
Details on Markers
23andme.com.
Adventurous DTC Vendors
genepartner.com.
• What security and privacy issues are raised by DTC genomics?
• How would you like to see your DNA data managed? What about the DNA of your relatives?
• Should it be legal to obtain your DNA without your consent?
Discussion
34