iDASH National Center for Biomedical Computing Day 2 LOM.pdf · iDASH National Center for...
Transcript of iDASH National Center for Biomedical Computing Day 2 LOM.pdf · iDASH National Center for...
iDASH National Center for Biomedical ComputingSharing and Protecting Human Subjects Data
9/30/15pSCANNER/iDASH all-hands
NIH U54 HL108460 Lucila Ohno-Machado, MD, MBA, PhDBiomedical Informatics, University of California San Diego
Patient Interaction
Data AnalysisStatistics Machine Learning
Data StructuringNatural Language ProcessingData Modeling
Predictive ModelingEvaluation Methods
Decision Support ToolsGuidelines, Alert & Reminders
Data Collection ToolsClinical Data Warehouse
Data IntegrationGenomicsProteomicsSensors
Data De-IdentificationPrivacy Technology
Communication StrategiesConsumer Health Informatics Medical Education
Knowledge & Tools
Privacy
Consent
Data
Our Goals
• Share access to data and computation
• Train the new generation of data scientists
• Provide innovative software, platform, and infrastructure
• Protect privacyDevelop» Algorithms» Tools» Infrastructure» Policies
iDASH
Knowledge& Tools
ServicesPlatform
Data
Sensors
Genomic
Clinical
ServiceWWW
Apps
Exec.
Aggreg.Hosting
Sharing
Policies
Platform
Research
Develop.
Federation
PostdocsMengLucaWenrui
Exchange agreements with universities inAustriaBrazilChina
PhD and master’s students EricWeiZhanglong
Yuan WuAsstProf Duke
MyoungLaSoftware EngineerMicrosoft
ShuangWangAsstProfUCSDK99/R00
AdelaGrandoAsstProf ASU
Elizabeth BellResearchAsstUCSD
XiaoqianJiangAsstProf UCSDK99/R00
Past Postdocs Past Interns
Trainees
MikeConwayAsstProfU UtahK99/R00
Undergrad Students
Tyler BathProgrammer UCSD
Alex HsiehPhD DBMI studentColumbia University
2011
KaushikSinhaAsstProf Wichita St
NLM Training Grant started 2012 (9 pre- and 6 postdoc slots)Graduates from the postdoc programMindy (Asst Prof UCLA) Augustine (PhD program)Dyvia (Fellowship in Resp Med UCSD)Edna (Residency in Surgery)
iDASH internship 5 years in a row
16
13
8
13 13
02468
1012141618
2011 2012 2013 2014 2015
Number of interns
Number of interns
Internship Symposium 2015
10/15/2015 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 6
Daniel Garcia Ulloa, Emory University 4th year graduate student
Pallavi Rao, UC Davis 2nd year graduate student
Lei Yang, University of Oklahoma 3rd year graduate studentDima Aref, New Jersey Institute of Technology Senior, undergraduate
Suyash Rathi, Syracuse University 2nd year graduate student
Lu Wang, Oregon State University 4th year graduate student
Haoyi Shi, Syracuse University 1st year graduate student
Dong Han, University of Oklahoma2nd year graduate student
Chao Jian, University of Oklahoma 1st year graduate student
Ko Dokmai, University of Virginia Will join UMD as a graduate
Michele Dow, Boston University Will join UCSD as a PHDRodrigo Gama Baptista, Federal University of Parana, Brazil Junior, undergraduate
Yerlan Idelbayev, UCSD 2nd year graduate student
2015 Cohort
Workshops, Symposia, Webinars• 12 Workshops
https://idash.ucsd.edu/events/workshops» 4 Privacy » 2 NLP » 2 Imaging Informatics» 4 Others (High Performance Computing, Biomedical Data Sharing,
IEEE HISB, Mobile Data)
• 10 Symposiahttps://idash.ucsd.edu/news-and-events» 5 All-Hands» 5 Internship
• 87 Webinars
Publications
• Published Articles and Book Chapters: 138• Presentations: 244• Posters: 72
Topic # Published Cell Biology 2Cloud Computing and Architecture 1Data Analysis and Compression 5Data Modeling and Integration 4Data Sharing 5Genomics 28Imaging Informatics 4Infrastructure 4Kawasaki Disease (DBP 1 & 4) 13Natural Language Processing 7Patient Centered Research 9Physical Activity Monitoring (DBP 3) 2Privacy Technology 41Statistics 13Total 138
https://idash.ucsd.edu/publications
As of 6/4/15
Integrating Different Types of Data
Genotype RNA
Metabolites
transcription
trans
latio
n
genome transcriptome
laboratoryPhysiology tests
Protein proteome
Phenotype physical exam, imaging, monitoring systems
● Predictive modeling and adjustment for cofounders require lots of data
● Some institutions cannot move data outside their firewalls, we can bring computation to the data
User requests data for Quality Improvement or Research
•Identity & Trust Management•Policy enforcement
Trusted Broker(s)
Security Entity
Diverse Healthcare Entitiesin 3 different states (federal, state, private)
Analysis: Distributed computingScalable National Network for Comparative Effectiveness Research
Wu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA, 2012 Wang S et al. EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed Privacy-Preserving Online Model Learning. J Biomed Inf 2013 Jiang W et al.. WebGLORE: A Webservice for Grid Logistic Regression. Bioinformatics 2014Wu Y et al. Grid Multi-Category Response Logistic Models. BMC Med Inform Dec Making 2015
Horizontal and Vertical Partitions
Patient Age Insurance
A1 45 X
A2 32 Y
Patient Age Insurance
B1 45 Y
B2 32 Y
Patient Age Insurance
A1 45 X
A2 32 Y
Li Y, Jiang X, Wang S, Xiong L, Ohno-Machado L. VERTIcal Grid lOgistic regression (VERTIGO) – accepted in the J Am Med Inf Assoc.
iDASH 2014 First Privacy Protection Challenge
• Task 1: Privacy-preserving SNP Data Sharing• Task 2: Privacy-preserving release of top K
most significant SNPs
Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis
2015 Privacy Protection Challenge
• Task 1: Homomorphicencryption (HME) based secure genomic data analysis
• Task 2: Secure comparison between genomic data in a distributed setting
• Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacy.org)
Genome Privacy Challenge 2015
Winners for HomomorphicEncryption
• Stanford/MIT• IBM• Microsoft
Consent Management System
Do I wish to disclose data D to U?
Sharing Look-up
Yes
Patient I
Patient Interface
I can check that U looked at my data D
• Data use agreements
• Study registry
Trusted broker
Healthcare Institutions
User U requests Data D on individual I
Sharing
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)
homomorphic encryption
secure multiparty computation
iDASH “commons”
Sharing Data, Tools, Systems
differential privacy
indexing
Research DataClinical Data Applications Integration2008-2009 2010-2011 2012-2013 2014-2015
Electronic Health Record SystemEpic & Clarity
Other SystemsPACS, lab, etc
Personnel SystemsActive Directory
Query ToolsUC-ReX ExplorerPrivacy Technology
Clinical Research DataRedCAPVelosOther DBs
iDASH HIPAA SHADEImages, human genomes, etc
Analytical Tools
Recruitment Consent toolsCustom Apps
VA LA Clinics
UCSF
Davis
Irvine
UCLA
Healthcare Clinical DataClinical Data Warehouse for Research
Scalable Network(Distributed Analytics Tools)
HIPAA
External data (patient reported data, sensors)
pSCANNERPCORI CDRN
iDASH HIPAA/FISMA OVERCASTiDASH, CTRI, School of Medicine
De-ID Tools
UCSD Health Sciences: Building Protected Health Information Networks
SCANNER
BRIGHT
iDASH
PhenDISCO NLM Training Grant
K22, K99s
PCORI contracts
Private Cloud
iCONCUR
UC-ReX
pSCANNER
Accrual for Clinical Trials
CTSA renewal
bioCADDIE
R21, subcontracts
Health System Department
USC/LAC Cedars Sinai
San Mateo
EpicCDDSNew modules
Intermountain
iDASH On-Demand Resources
SafeHIPAA-compliantAnnotated Data deposit boxEnvironment
On-demandVirtualizedElasticResilientCompute AndStorageTechnology
HIPAA and non-public data
public data, tools, recipes
Pow
ered
by
MID
AS
Data Tools Recipes
upload & download data
compute request,direct upload & download of proprietary data, tool, recipe
middleware and HIPAA security developed by iDASH
Compute nodesMemoryDisk storageNetworking
Pow
ered
by
VMw
areAUTOMATED
Clinical Research Informatics CTRI
Clinical Trial Management System, RedCAPData Concierge ServiceManagement of iDASH HIPAA cloud
20
iDASH SHADE Repositories
• Based on Kitware MIDAS open-source technology
• File-level access control• Separate PHI and Non-
PHI repositories• Two Factor Auth (PHI)
https://idash-data.ucsd.edu/
Institutions with Signed Agreements
DCA• National
» UCSD» Children’s Health Care of Atlanta (GA)» Long Beach Veterans Affairs Medical
Center» Ortho Kenematics (TX)
• International» Mahidol University (Thailand)
DUA• National
» UCSD» Databetes (NY)» Tin Man Labs, LLC (TX)» UMass Dartmouth» Georgia Institute of Technology» University of Utah» The Ola Grimsby Institute (CA)» The Methodist Hospital Research Institute (TX)» Wake Forest University Health Systems (NC)
• International» North West London Hospitals NHS Trust (UK)» The University Hospital of Leuven (Belgium)» INRIA (France)» Newton Circus Pte. Ltd. (Singapore)
Repeatable Results
Workflow
Short reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Blueprint
WorkflowShort reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Cont
ext
Reference DB
Test data
Configuration
Helper tools
OS
Blueprint
WorkflowShort reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Cont
ext
Reference DB
Test data
Configuration
Helper tools
OS
Blueprint
WorkflowShort reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Cont
ext
Reference DB
Test data
Configuration
Helper tools
OS
Blueprint
WorkflowShort reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Cont
ext
Reference DB
Test data
Configuration
Helper tools
OS
Instance
WorkflowShort reads
Index reference
Align to reference
Call variants
Annotate variants
Pick high impact
Deleterious SNPs
Cont
ext
Reference DB
Test data
Configuration
Helper tools
OS
iDASH On-Demand Resources
BookshelfMyDATA
InputResults
Instance
External Data
Collaborative ProjectsLinked R01s• Cardiac Atlas Project (R01HL121754)
» Goal: Develop accurate new methods for analyzing cardiac shape, mechanics and blood flow in CHD patients
• CYCORE: Cyberinfrastructure for Cancer Comparative Effectiveness Research (R01CA177996)
» Goal: Develop a system that improves the capture of patient-reported and objectively measured data from patients in cancer clinical trials
• Privacy-Preserved Sharing and Analysis of Human Genomic Data (R01HG007078) » Goal: Study and develop a suite of innovative and transformative techniques aimed at
achieving practical and cost-effective genomic data protection
• SHARE: Statistical Health Information Release with Differential Privacy (R0101GM114612)
» Goal: Develop a toolkit for enabling privacy-preserving health information release to cover different data modality and study needs
PCORI-funded methods grant to collaborator Li Xiong from EmoryNSF-funded infrastructure grant to collaborator Kevin PatrickR21 on cloud privacy to Xiaoqian Jiang
The Near Future
• Ethics technology» Instrument policy makers with algorithms and tools to
support ethics (including privacy)
• Serve HIPAA-storage and compute needs of a larger community» Data Discovery Index prototype environment» Private cloud for protected health information
• Hub infrastructure for large HIPAA-data networks» FISMA ATO» Distributed computing