Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD
-
Upload
wireless-life-science-alliance -
Category
Documents
-
view
656 -
download
2
description
Transcript of Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD
The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services.
Panel Discussion: Big Data at the iDASH Center
Lucila Ohno-Machado, MD, PhDDivision of Biomedical InformaticsUniversity of California San Diego Editor-in-Chief, Journal of the American Medical Informatics Association
Wireless Health 2012
21st Century Healthcare
What is the influence of genetics, environment?
Which therapies work best for individual patients?
Patient-Centered Outcomes Research
• Genome– Sequencing data
• Phenotype– Personal monitoring
• Blood pressure, glucose
– Personal health records– Behavior monitoring
• Adherence to medication, exercise
• Environment– Air sensors, food quality– Location Source: DOE
Where does knowledge come from?
• Small controlled studies with strict eligibility criteria• Does this apply to my patient?
Hopefully, but we need a lot of data to answer this question:• We need to build infrastructure to access large data
repositories – Lower the barriers to share data
• We need to share tools to analyze the data– Algorithms and computational facilities
Big Data, Small Data, and Other Data
• Data integration across biological scales• Data analysis from multiple sources• Data ‘anonymization’ and privacy preservation
5/18/2012
Gen
otyp
e•
Gen
om
e•
Tran
scrip
tion
RNA
•Tr
ansc
ripto
me •
Tran
slati
on
Prot
ein
• Pro
teom
e
Biom
ark
ers
•La
b
Phen
oty
pe•
Clin
ica
l Dat
a
Popu
latio
n•
Regi
strie
s
Clinical Translational Science
• Integration of Clinical Data Warehouses from 5 University of California Medical Centers and affiliated institutions (>10 million patients)– Aggregate and individual-level patient data
will be accessible according to data use agreements and IRB approval
• Objectives– Monitor patient safety– Improve outcomes– Promote researchFunded by the UC Office of the
President to the NIH-funded CTSAs
Data for Personalized Medicine
Handling Protected Health Information - Secure Electronic Environment
• Electronic Health Records• Genetic Data
Prevention, Diagnosis and Therapy– Genetic predisposition– Biomarkers– Pharmacogenomics– Health records– Sensors
• Data use agreements across institutions – Limited and complicated – Specific to a particular study– Resources for sharing are limited– Security/privacy constraints are hard
for small institutions to follow
• Sharing data today– Little incentive– Only one model: users download data– Yes/No decision on sharing
Sharing Data
9
iDASH
Mission
“A national center for biomedical computing that develops new algorithms, open-source tools, computational infrastructure, and services that will enable biomedical and behavioral researchers nationwide to integrate Data for Analysis, ‘anonymization,’ and Sharing”
5/18/201210
Models for Data Sharing
• Cloud Storage: data exported for computation
elsewhere– Users download data from the cloud
• Cloud Compute and Virtualization: computation goes to the data
– Users analyze data in the cloud– Users download virtual machines
11funded by NIH U54HL108460
Models for Sharing Data Access
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023
Tool Creator
System Creator
data 1
tool 1
tool 2
tool 3
Data Owner
data 1 Contributor DUAQA
ContributorQA
tool 2
ContributorQA VM 2VM 2
access control
VM 1
Data Use AgreementDUA
Quality AssuranceQA
Models for Sharing Data Access
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023
User A
Tool Creator
System Creator
data 1
data 2
tool 1
tool 2
tool 3
data 2
Data Owner
data 1 Contributor DUAQA
UserDUA
tool A
ContributorQA
tool 2
MODEL 1. User downloads iDASH data
ContributorQA VM 2VM 2
access control
VM 1
Models for Sharing Data Access
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023
User A
User B
Tool Creator
System Creator
data 1
data 2
tool 1
tool 2
tool 3
data 2
Data Owner
data 1 Contributor DUAQA
UserDUA
tool A
ContributorQA
tool 2
MODEL 1. User downloads iDASH data
UserDUA
MODEL 2. User computes with iDASH hosted data in iDASH environment
ContributorQA VM 2VM 2
access control
VM 1
Models for Sharing Data Access
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023
User A
User B
Tool Creator
System Creator
data 1
data 2
tool 1
tool 2
tool 3
User C
data 2
Data Owner
data 1 Contributor DUAQA
UserDUA
tool A
ContributorQA
tool 2
MODEL 1. User downloads iDASH data
UserDUA
MODEL 2. User computes with iDASH hosted data in iDASH environment
ContributorQA VM 2VM 2
access control
VM 2
data C
MODEL 3 User performs iDASH computation in his own environment
VM 1
User requests data for Quality Improvement
or ResearchAre the data accessible?
• Identity & Trust Management
• Policy enforcement
Trusted Broker(s)
Security Entity
AHRQ R01HS19913 / EDM forum
Quality Improvement, Health Services Research
Count queries and statistics across data warehouses
Diverse Healthcare Entities
in 3 different states(federal, state, private)
How many patients over 65 are on Warfarin or Dabigatran?
What are the major and minor bleeding rates for patients on these drugs?
User requests data for Quality Improvement
or ResearchAre the data accessible?
• Identity & Trust Management
• Policy enforcement
Trusted Broker(s)
Security Entity
AHRQ R01HS19913 / EDM forum
Adjusting for Confounders
Distributed regression modelsWu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA 2012
Diverse Healthcare Entities
in 3 different states(federal, state, private)
Shared Services and Infrastructure
04/10/2023
SaaS
PaaS
IaaSOperators,
Developers, Collaborators
Researchers, DevelopersCollaborators
Healthcare professionals,End-user services
• Software as a Service• Platform • Infrastructure
• Security & Policies• Scalability & Reliability• Flexibility & Extensibility
Frame/Infrastructure
Body/Platform
Business/Service
i D A
S H
Underlying Infrastructure
5/18/2012
SaaS
PaaS
IaaSiDASH Operators,
Developers, Collaborators
Researchers, DevelopersCollaborators
Biomedical Researchers,End-user services • Resource virtualization
• Security• Scalability• Flexibility
i D A
S H
Figure courtesy of Dallas Thornton
Cyberinfrastructure Security
• HIPAA (Health Insurance Portability and Accountability Act) compliant Computing environment
• Segmentation (Zones) of iprojects & functionality• Physical and Environmental Protection of compute hardware• Access control with Two Factor Authentication• Secure (encrypted tunnel) system access and upload
capability• Centralized logging, intrusion detection• Proxies and filters• Hardened (secured) system configurations
5/18/2012
Research data from several institutions:Clinical & genomic data hosting in a HIPAA compliant facility
• 315TB Cloud and project storage for 100s of virtual servers
• 54TB high-speed database and system storage; high-performance parallel databases
• 10Gb redundant network environment; firewall and IDS to address HIPAA requirements
• Multiple-site encrypted storage of critical data
Shared Infrastructure
Repository for Healthcare & Biomedical Data
5/18/2012
5/18/2012
http://idash.ucsd.edu
Informed Consent
Management System
Do I wish to disclose data D
to U?
Information Exchange Registry
User U requests Data D on individual I for
Quality Improvement or Research
Are the data available?
YesNo
Yes
No
Preferences
Inspection
• Identity Management
• Trust Management
Home
Trusted Broker(s)
Patient I
Security Entity
Healthcare Entity
Privacy Registry
I can check who or which entity
looked (wanted to look) at the data for what reasons
AHRQ R01HS19913 / EDM forum NIH U54HL10846
Patient-Centered Data Sharing
Acknowledgements
• Slides contributed byBrian ChapmanClaudiu FarcasDallas ThorntonDanielle MoweryHyeon-eui KimJihoon KimKamalika ChaudhuriNatasha BalacRon Joyce
Shuang WangStaal VinterboVineet BafnaWendy ChapmanWinston ArmstrongXiaoqian Jiang
• Division of Biomedical Informatics
• Funding byNIHAHRQPCORIUCOPUCSD