Anonymity through Data cubes
description
Transcript of Anonymity through Data cubes
![Page 1: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/1.jpg)
Linked2Safety Project (FP7-ICT-2011-7 – 5.3)A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR
SEMANTICALLY-INTERCONNECTING ELECTRONIC HEALTH RECORDSAND CLINICAL TRIALS SYSTEMS
ADVANCING PATIENTS SAFETY IN CLINICAL RESEARCH
12th International Conference on Bioinformatics and Bioengineering, Larnaka
Anonymity through Data cubes
Athos Antoniades
![Page 2: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/2.jpg)
FP7, ICT-2011 – 5.3 Page 2
Introduction
Why Share Data? What are the current legal and ethical
limitations? How have scientists shared medical data so far? Key Problems Perturbation Cell Suppression
![Page 3: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/3.jpg)
FP7, ICT-2011 – 5.3 Page 3
The Problem
Why share data:Replication TestingStatistical PowerMultiple Testing Problem
Legal and Ethical IssuesAnonymization vs PseudoanonimizationLimitations derived from consent form signed by subjectsOther, regional, study, or subject specific issues.
![Page 4: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/4.jpg)
FP7, ICT-2011 – 5.3 Page 4
How have scientists shared medical data Contingency Table and Data Cube
example
aa aA AA
Case U00 U01 U02
Control U10 U11 U12
![Page 5: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/5.jpg)
FP7, ICT-2011 – 5.3 Page 5
16 year old widow Problem
A paper that analyzes data from a specific study reports:
Marital Status
AgeAge Married Widowed Single0-16 0 1 50
18-24 10 5 5025-34 40 7 4035~ 60 15 20
![Page 6: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/6.jpg)
FP7, ICT-2011 – 5.3 Page 6
16 year old widow Problem
A paper that analyzes data from a specific study reports:
Marital Status
AgeAge Married Widowed Single0-16 0 1 50
18-24 10 5 5025-34 40 7 4035~ 60 15 20
![Page 7: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/7.jpg)
FP7, ICT-2011 – 5.3 Page 7
16 year old widow Problem
A paper that analyzes data from a specific study reports:
Marital Status
AgeAge Married Widowed Single0-16 0 1 50
18-24 10 5 5025-34 40 7 4035~ 60 15 20
![Page 8: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/8.jpg)
FP7, ICT-2011 – 5.3 Page 8
Categorization Differences
Paper 1 that analyzes data from a specific
study reports:Marital Status
Age
Age MarriedWidowe
d Single0-16 NA NA 50
18-24 10 7 5025-34 40 7 4035~ 60 15 20
Marital Status
Age
Age MarriedWidowe
d Single0-16 NA NA 50
18-25 10 8 5026-35 45 7 4036~ 55 14 20
Paper 2 that analyzes data from the same
study reports:
![Page 9: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/9.jpg)
FP7, ICT-2011 – 5.3 Page 9
Perturbation and Cell Suppression
Original Data
Marital Status
Age
Age MarriedWidowe
d Single0-16 0 1 50
18-24 10 7 5025-34 40 7 4035~ 60 15 20
Marital Status
Age
Age MarriedWidowe
d Single0-16 NA NA 51
18-24 9 8 4925-34 40 7 4135~ 61 14 21
Perturbation (+-1) andCell Suppression (<5)
![Page 10: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/10.jpg)
FP7, ICT-2011 – 5.3 Page 10
Evaluation
• Most common parameters testedPerturbation:[0], [-1,1], [-3,3], [-5,5], [-10,10]Cell Supression: <0, <=1, <=3,<=5,<=10
• Standard main effect test using Chi Square
• Pearson’s Correlation Coefficient used to evaluate deviation of each parameter combination to original results.
• A-priory defined threshold for Pearson’s correlation coefficient <=0.95.
![Page 11: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/11.jpg)
FP7, ICT-2011 – 5.3 Page 11
Evaluating Parameters with a matrix of graphs
![Page 12: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/12.jpg)
FP7, ICT-2011 – 5.3 Page 12
Linked2Safety’s Data Analysis Space
Objectives: Design and develop the data mining techniques and the scalable
infrastructure for the identification of phenotypic and genetic associations related to adverse events.
Develop new and implement existing state of the art analytical approaches for genetic data.
Define and implement the knowledge extraction and filtering mechanisms and the knowledge base
Integrate the knowledge base into a lightweight decision support system (Adverse events early detection mechanism)
![Page 13: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/13.jpg)
FP7, ICT-2011 – 5.3 Page 13
Data Analysis Steps
![Page 14: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/14.jpg)
FP7, ICT-2011 – 5.3 Page 14
Quality Control Subspace
Provides the tools for identifying and removing erroneous data or data that do not conform to the quality standards that a user might define.
Tools: Hardy-Weinberg Equilibrium Test Allele Frequency Test Missing Data Test
![Page 15: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/15.jpg)
FP7, ICT-2011 – 5.3 Page 15
Feature Selection Subspace
Provides the tools for removing redundant or irrelevant features from a dataset.
Tools: Rough Set Feature Selection Information Gain Feature Selection Chi Squared Feature Selection
![Page 16: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/16.jpg)
FP7, ICT-2011 – 5.3 Page 16
Data Analysis Steps
![Page 17: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/17.jpg)
FP7, ICT-2011 – 5.3 Page 17
Single Hypothesis Testing Subspace
Provides the tools for performing single hypothesis testing on a dataset and test for associations.
Tools: Pearson’s Chi Square Test Fisher’s Exact Test Odds Ratio Binomial Logistic Regression Linkage Disequilibrium Genetic Region Based Association Testing
![Page 18: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/18.jpg)
FP7, ICT-2011 – 5.3 Page 18
Data Mining Subspace
Provides the tools for performing data mining analyses on a dataset and extract association rules.
Tools: Association Rules (apriori) Decision Trees with Percentage Split (C4.5) Decision Trees with Cross Validation (C4.5) Random Forest with Percentage Split Random Forest with Cross Validation
![Page 19: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/19.jpg)
FP7, ICT-2011 – 5.3 Page 19
Data Analysis Space Interactions
![Page 20: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/20.jpg)
FP7, ICT-2011 – 5.3 Page 20
Data Analysis Steps
![Page 21: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/21.jpg)
FP7, ICT-2011 – 5.3 Page 21
Knowledge Extraction and Filtering Mechanism
Knowledge Extraction Mechanism This mechanism is responsible for storing
statistically significant associations and important association rules in the Linked2Safety knowledge database
Has two steps: Logging system Storing important knowledge
Filtering mechanism This mechanism allows users to insert or delete
associations and association rules
![Page 22: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/22.jpg)
FP7, ICT-2011 – 5.3 Page 22
Adverse Event Early Detection Mechanism
Uses the knowledge in the L2S knowledge base Runs in the background to identify new
associations and association rules Reruns analyses when updated datasets are
available Creates alerts for patients profiles associated
with adverse events
![Page 23: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/23.jpg)
FP7, ICT-2011 – 5.3 Page 23
Linked2Safety’s Data Analysis Platform
![Page 24: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/24.jpg)
FP7, ICT-2011 – 5.3 Page 24
Linked2Safety’s Data Analysis Platform Workflow Screenshot
![Page 25: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/25.jpg)
FP7, ICT-2011 – 5.3 Page 25
Patterns Discovery Common Variable Selection
Overlapping non genetic data of at least 2 data providers: Variables
Age Weight gainGender HeadachesBMI Gastrointestinal symptomsSmoking Ever Ophthalmological problemsDyslipidemia Type of ophthalmological condition Diabetes High blood pressureDiabetes type I Heart conditions existDiabetes type II Type of heart conditionAnemia HypertensionDepressive personality disorder Myocardial infarctionMajor depressive disorder StrokeSchizotypal personality disorder Coronary heart disease
![Page 26: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/26.jpg)
FP7, ICT-2011 – 5.3 Page 26
Conclusion and future work on utilizing data cubes
We were able to identify for a given dataset the maximum noise that can be added to the data without significantly affecting the outcomes.
Results presented are only relevant to MASTOS, all other datasets need to repeat the analytical approach described to determine the maximum noise that can be added to the results.
Further investigation is necessary to identify the minimum parameter settings to satisfy legal and ethical requirements.
![Page 27: Anonymity through Data cubes](https://reader033.fdocuments.in/reader033/viewer/2022051700/568163c5550346895dd4f237/html5/thumbnails/27.jpg)
FP7, ICT-2011 – 5.3 Page 27
Who to Contact
Athos AntoniadesUniversity of Cyprus
email: [email protected]