Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

25
The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services. Panel Discussion: Big Data at the iDASH Center Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics University of California San Diego Editor-in-Chief, Journal of the American Medical Informatics Association Wireless Health 2012

description

Tuesday, October 23, 2012 Panel Discussion: Big Data Moderator: Roozbeh Jafari, PhD – Electrical Engineering, UT Dallas Panelists: Holly Jimison, PhD – Medical Informatics & Clinical Epidemiology, OHSU James McClain, PhD – Physical Activity Epidemiologist , Risk Factor Monitoring & Methods Branch, National Cancer Institute (NCI) Lucila Ohno-Machado, MD, PhD – Associate Dean for Informatics & Technology, School of Medicine; Founding Chief, Division of Biomedical Informatics; Professor of Medicine, UC San Diego

Transcript of Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Page 1: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services.

Panel Discussion: Big Data at the iDASH Center

Lucila Ohno-Machado, MD, PhDDivision of Biomedical InformaticsUniversity of California San Diego Editor-in-Chief, Journal of the American Medical Informatics Association

Wireless Health 2012

Page 2: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

21st Century Healthcare

What is the influence of genetics, environment?

Which therapies work best for individual patients?

Page 3: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Patient-Centered Outcomes Research

• Genome– Sequencing data

• Phenotype– Personal monitoring

• Blood pressure, glucose

– Personal health records– Behavior monitoring

• Adherence to medication, exercise

• Environment– Air sensors, food quality– Location Source: DOE

Page 4: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Where does knowledge come from?

• Small controlled studies with strict eligibility criteria• Does this apply to my patient?

Hopefully, but we need a lot of data to answer this question:• We need to build infrastructure to access large data

repositories – Lower the barriers to share data

• We need to share tools to analyze the data– Algorithms and computational facilities

Page 5: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Big Data, Small Data, and Other Data

• Data integration across biological scales• Data analysis from multiple sources• Data ‘anonymization’ and privacy preservation

5/18/2012

Gen

otyp

e•

Gen

om

e•

Tran

scrip

tion

RNA

•Tr

ansc

ripto

me •

Tran

slati

on

Prot

ein

• Pro

teom

e

Biom

ark

ers

•La

b

Phen

oty

pe•

Clin

ica

l Dat

a

Popu

latio

n•

Regi

strie

s

Page 6: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Clinical Translational Science

• Integration of Clinical Data Warehouses from 5 University of California Medical Centers and affiliated institutions (>10 million patients)– Aggregate and individual-level patient data

will be accessible according to data use agreements and IRB approval

• Objectives– Monitor patient safety– Improve outcomes– Promote researchFunded by the UC Office of the

President to the NIH-funded CTSAs

Page 7: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Data for Personalized Medicine

Handling Protected Health Information - Secure Electronic Environment

• Electronic Health Records• Genetic Data

Prevention, Diagnosis and Therapy– Genetic predisposition– Biomarkers– Pharmacogenomics– Health records– Sensors

Page 8: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

• Data use agreements across institutions – Limited and complicated – Specific to a particular study– Resources for sharing are limited– Security/privacy constraints are hard

for small institutions to follow

• Sharing data today– Little incentive– Only one model: users download data– Yes/No decision on sharing

Sharing Data

Page 9: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

9

iDASH

Page 10: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Mission

“A national center for biomedical computing that develops new algorithms, open-source tools, computational infrastructure, and services that will enable biomedical and behavioral researchers nationwide to integrate Data for Analysis, ‘anonymization,’ and Sharing”

5/18/201210

Page 11: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Models for Data Sharing

• Cloud Storage: data exported for computation

elsewhere– Users download data from the cloud

• Cloud Compute and Virtualization: computation goes to the data

– Users analyze data in the cloud– Users download virtual machines

11funded by NIH U54HL108460

Page 12: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Models for Sharing Data Access

Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023

Tool Creator

System Creator

data 1

tool 1

tool 2

tool 3

Data Owner

data 1 Contributor DUAQA

ContributorQA

tool 2

ContributorQA VM 2VM 2

access control

VM 1

Data Use AgreementDUA

Quality AssuranceQA

Page 13: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Models for Sharing Data Access

Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023

User A

Tool Creator

System Creator

data 1

data 2

tool 1

tool 2

tool 3

data 2

Data Owner

data 1 Contributor DUAQA

UserDUA

tool A

ContributorQA

tool 2

MODEL 1. User downloads iDASH data

ContributorQA VM 2VM 2

access control

VM 1

Page 14: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Models for Sharing Data Access

Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023

User A

User B

Tool Creator

System Creator

data 1

data 2

tool 1

tool 2

tool 3

data 2

Data Owner

data 1 Contributor DUAQA

UserDUA

tool A

ContributorQA

tool 2

MODEL 1. User downloads iDASH data

UserDUA

MODEL 2. User computes with iDASH hosted data in iDASH environment

ContributorQA VM 2VM 2

access control

VM 1

Page 15: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Models for Sharing Data Access

Supported by the NIH Grant U54 HL108460 to the University of California, San Diego04/10/2023

User A

User B

Tool Creator

System Creator

data 1

data 2

tool 1

tool 2

tool 3

User C

data 2

Data Owner

data 1 Contributor DUAQA

UserDUA

tool A

ContributorQA

tool 2

MODEL 1. User downloads iDASH data

UserDUA

MODEL 2. User computes with iDASH hosted data in iDASH environment

ContributorQA VM 2VM 2

access control

VM 2

data C

MODEL 3 User performs iDASH computation in his own environment

VM 1

Page 16: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

User requests data for Quality Improvement

or ResearchAre the data accessible?

• Identity & Trust Management

• Policy enforcement

Trusted Broker(s)

Security Entity

AHRQ R01HS19913 / EDM forum

Quality Improvement, Health Services Research

Count queries and statistics across data warehouses

Diverse Healthcare Entities

in 3 different states(federal, state, private)

How many patients over 65 are on Warfarin or Dabigatran?

What are the major and minor bleeding rates for patients on these drugs?

Page 17: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

User requests data for Quality Improvement

or ResearchAre the data accessible?

• Identity & Trust Management

• Policy enforcement

Trusted Broker(s)

Security Entity

AHRQ R01HS19913 / EDM forum

Adjusting for Confounders

Distributed regression modelsWu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA 2012

Diverse Healthcare Entities

in 3 different states(federal, state, private)

Page 18: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Shared Services and Infrastructure

04/10/2023

SaaS

PaaS

IaaSOperators,

Developers, Collaborators

Researchers, DevelopersCollaborators

Healthcare professionals,End-user services

• Software as a Service• Platform • Infrastructure

• Security & Policies• Scalability & Reliability• Flexibility & Extensibility

Frame/Infrastructure

Body/Platform

Business/Service

i D A

S H

Page 19: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Underlying Infrastructure

5/18/2012

SaaS

PaaS

IaaSiDASH Operators,

Developers, Collaborators

Researchers, DevelopersCollaborators

Biomedical Researchers,End-user services • Resource virtualization

• Security• Scalability• Flexibility

i D A

S H

Figure courtesy of Dallas Thornton

Page 20: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Cyberinfrastructure Security

• HIPAA (Health Insurance Portability and Accountability Act) compliant Computing environment

• Segmentation (Zones) of iprojects & functionality• Physical and Environmental Protection of compute hardware• Access control with Two Factor Authentication• Secure (encrypted tunnel) system access and upload

capability• Centralized logging, intrusion detection• Proxies and filters• Hardened (secured) system configurations

5/18/2012

Page 21: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Research data from several institutions:Clinical & genomic data hosting in a HIPAA compliant facility

• 315TB Cloud and project storage for 100s of virtual servers

• 54TB high-speed database and system storage; high-performance parallel databases

• 10Gb redundant network environment; firewall and IDS to address HIPAA requirements

• Multiple-site encrypted storage of critical data

Shared Infrastructure

Page 22: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Repository for Healthcare & Biomedical Data

5/18/2012

Page 23: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

5/18/2012

http://idash.ucsd.edu

Page 24: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Informed Consent

Management System

Do I wish to disclose data D

to U?

Information Exchange Registry

User U requests Data D on individual I for

Quality Improvement or Research

Are the data available?

YesNo

Yes

No

Preferences

Inspection

• Identity Management

• Trust Management

Home

Trusted Broker(s)

Patient I

Security Entity

Healthcare Entity

Privacy Registry

I can check who or which entity

looked (wanted to look) at the data for what reasons

AHRQ R01HS19913 / EDM forum NIH U54HL10846

Patient-Centered Data Sharing

Page 25: Panel Discussion: Big Data; Lucila Ohno-Machado, MD, PhD

Acknowledgements

• Slides contributed byBrian ChapmanClaudiu FarcasDallas ThorntonDanielle MoweryHyeon-eui KimJihoon KimKamalika ChaudhuriNatasha BalacRon Joyce

Shuang WangStaal VinterboVineet BafnaWendy ChapmanWinston ArmstrongXiaoqian Jiang

• Division of Biomedical Informatics

• Funding byNIHAHRQPCORIUCOPUCSD