Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and...

71
Preserving privacy in data sharing Darren Toh, ScD Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute December 7, 2017

Transcript of Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and...

Page 1: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Preserving privacy in data sharing

Darren Toh, ScDDepartment of Population Medicine

Harvard Medical School & Harvard Pilgrim Health Care Institute

December 7, 2017

Page 2: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Disclosures

• The work presented here is/was supported by• Patient-Centered Outcomes Research Institute (ME-1403-11305)• Office of the Assistant Secretary for Planning and Evaluation• Food and Drug Administration (HHSF223200910006I)• National Institutes of Health (U01EB023683)• Agency for Healthcare Research and Quality (R01HS019912)

• All statements in this presentation, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of AHRQ, ASPE, FDA, NIH, PCORI, or PCORI’s Board of Governors or Methodology Committee

2

Page 3: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

3

Page 4: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

4

Page 5: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Multi-database studies

• Many studies are now done in multi-database settings

5

Page 6: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Benefits of multi-database studies

• Larger sample sizes• Allow studies of rare treatments or rare outcomes• Allow studies in specific subpopulations• Allow studies to be done more quickly

• More diverse populations• Allow more generalizable findings• Allow assessment of treatment effect heterogeneity

6

Page 7: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Types of data shared

• Insurance claims

• Electronic health records (inpatient & outpatient)

• Registries (e.g., birth, immunization, disease, treatment)

• Genomic data

• Patient-generated data

7

Page 8: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Multi-database studies

Analysis center

Site 1

Site 2Site 3

8

Page 9: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Multi-database studies

Analysis center

Site 1

Site 2Site 3

9

Page 10: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Multi-database studies

Analysis center

Site 1

Site 2Site 3

Pooling the entire databases10

Page 11: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Concerns about data sharing

• Patient privacy and confidentiality

• Data security

• Unauthorized use of data

• Inaccurate analysis or interpretation of data

• Disclosure of sensitive institutional or corporate info

• Contractual restrictions

11

Page 12: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Data sharing in multi-database studies

Data we need to conduct the

desired analysis

What data partners are willing or able

to share

12

Page 13: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

13

Page 14: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Understand factors that influence data sharing

• Semi-structured interviews with key stakeholders

• Identify factors that facilitate data sharing

• Identify concerns that discourage data sharing

14

Page 15: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Stakeholders interviewed

Mazor et al, J Comp Eff Res, 2017;6(6):537-547 15

Page 16: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Stakeholder interview domains

Mazor et al, J Comp Eff Res, 2017;6(6):537-547 16

Page 17: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Findings from stakeholder interviews

Mazor et al, J Comp Eff Res, 2017;6(6):537-547 17

Page 18: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

18

Page 19: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed data network (DDN) architecture

• No pooling of the entire databases from all sites

• Data partners maintain physical control of their data

• Data partners have ability to opt out of any request

• Only transfer minimal necessary information

19

Page 20: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed data networks – Vanilla version

Analysis center

Site 1

Site 2Site 3

20

Page 21: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed data networks – Vanilla version

Analysis center

Site 1

Site 2Site 3

Pooling study-specific individual-level datasets21

Page 22: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Examples of distributed data networks

22

Page 23: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Typical analytic datasets shared in DDNs

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

Site 1

23

Page 24: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Typical analytic datasets shared in DDNs

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

Site 1

Each row represents an individual

24

Page 25: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Typical analytic datasets shared in DDNs

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

Site 1

Each column represents a variable

25

Page 26: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Standardizing databases

Adapted from: http://www.hcsrn.org/asset/b9efb268-eb86-400e-8c74-2d42ac57fa4F/VDW.Infographic031511.jpg

Individual data partners

Site 1 Site 2

Site 3 Site 4

Data standardization(common data model)

Site 1

Site 2

Site 3

Site 4

Data accessible to research projects

• Research projects

• Programs written against common data model

Data quality improvement feedback loop

26

Page 27: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review results

5- Data Partners return results via secure network

6 Results are aggregated and returned

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 27

Page 28: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review results

5- Data Partners return results via secure network

6 Results are aggregated and returned

2

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 28

Page 29: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Outout

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review results

5- Data Partners return results via secure network

6 Results are aggregated and returned

23

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

3

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 29

Page 30: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review output

5- Data Partners return results via secure network

6 Results are aggregated and returned

23 4

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 30

Page 31: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review output

5- Data Partners return outputs via secure network

6 Results are aggregated and returned

23 4

5

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 31

Page 32: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review output

5- Data Partners return outputs via secure network

6- Outputs are aggregated and analyzed

23 4

5

6

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 32

Page 33: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Sharing patient-level datasets in DDNs

• Patient-level info can generally be de-identified to avoid sharing of sensitive patient info

• But even so, several concerns may still persist

• Sometimes it is not possible to share patient-level info due to these concerns or other reasons

33

Page 34: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Challenges in de-identifying patient information

34

Page 35: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Question 1

• Do we have other ways to share data?

35

Page 36: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Question 2

• Can we perform the analysis we want without sharing potentially identifiable patient-level data?

36

Page 37: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Question 3

• Better yet, can we perform the analysis we want without sharing patient-level data at all?

37

Page 38: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

38

Page 39: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Not using privacy-protecting analytic methods

Analysis center

Site 1

Site 2Site 3

39

Page 40: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Using privacy-protecting analytic methods

Analysis center

Site 1

Site 2Site 3

40

Page 41: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Using privacy-protecting analytic methods

Analysis center

Site 1

Site 2Site 3

Pooling study-specific summary-level datasets41

Page 42: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Confounder summary scores

Race

AgeSex

TxPx Dx

Propensity Score (PS)or

Disease Risk Score (DRS)

Treatment Outcome

Confounders

DRSPS

42

Page 43: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Typical analytic datasets shared in DDNs

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

Site 1

43

Page 44: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Using confounder summary scores

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.04

003 0 0 365 0.05

004 0 0 200 0.54

005 0 1 2 0.22

006 1 1 15 0.45

007 1 0 4 0.09

008 1 0 145 0.79

009 0 1 33 0.21

010 0 0 98 0.01

011 0 0 34 0.38

… … … … …

Site 1

Toh et al, Med Care, 2013;51:S4-S10 44

Page 45: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Summary score-matched analysis

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.04

003 0 0 365 0.05

004 0 0 200 0.54

005 0 1 2 0.22

006 1 1 15 0.45

007 1 0 4 0.09

008 1 0 145 0.79

009 0 1 33 0.21

010 0 0 98 0.01

011 0 0 34 0.38

… … … … …

Site 1 PT in

ExposedPT in Un-exposed

Event in Exposed

Event in Un-

exposed

355.6 233.4 40 35

• Only four numbers are needed (in 1:1 matching)

• Lead team uses data from all sites to obtain overall results

Toh et al, Med Care, 2013;51:S4-S10 45

Page 46: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Summary score-stratified analysis

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.04

003 0 0 365 0.05

004 0 0 200 0.54

005 0 1 2 0.22

006 1 1 15 0.45

007 1 0 4 0.09

008 1 0 145 0.79

009 0 1 33 0.21

010 0 0 98 0.01

011 0 0 34 0.38

… … … … …

Site 1

• Each record is a summary score-based stratum

• Lead team uses methods, e.g., the Mantel-Haenszelmethod, to obtain overall results

PS stratum

PT in Exposed

PT in Un-

exposed

Event in Exposed

Event in Un-

exposed

1 34.5 70.1 10 8

2 32.4 32.6 7 21

3 56.2 44.2 9 10

4 12.8 56.2 12 6

Toh et al, Med Care, 2013;51:S4-S10 46

Page 47: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Meta-analysisSite 1

• Each record is an effect estimate and its 95% CI

• There is only 1 record per site

• Lead team uses meta-analytic approach to obtain overall results

Toh et al, Med Care, 2013;51:S4-S10

HR Lower 95% CI

Upper 95% CI

2.97 1.95 4.52

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

47

Page 48: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed regression

PatID Exposure Outcome Time X1 X2 X3 X4 X5 …

001 1 0 312 0 1 0 1 1 …

002 0 0 40 1 1 0 1 0 …

003 0 0 365 1 0 0 0 0 …

004 0 0 200 2 0 1 0 0 …

005 0 1 2 3 0 0 1 0 …

006 1 1 15 3 1 0 0 1 …

007 1 0 4 1 1 1 0 1 …

008 1 0 145 0 0 1 0 0 …

009 0 1 33 2 1 0 0 0 …

010 0 0 98 1 1 0 0 0 …

011 0 0 34 1 0 0 0 0 …

… … … … … … … … … …

Site 1

Type Name INT X1 X2

SSCP INT 152.45 56.74 121.65

SSCP X1 56.74 342.45 88.55

SSCP X2 121.65 88.55 422.32

Mean 1.00 3.45 65.78

STD 0.00 4.65 22.34

N 500 500 500

Karr et al, J Comput Graph Stat, 2005;14:263-279

• Each record is a summary statistic

• Lead team uses the summary statistics to perform regression analysis

48

Page 49: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Distributed analysis

Review & Run Query

Review & Return Output

Data Partner 1

EnrollmentDemographics

UtilizationPharmacy

Etc

1- User creates and submits query

2- Data Partners retrieve query

3- Data Partners review and run query against their local data

4- Data Partners review output

5- Data Partners return outputs via secure network

6- Outputs are aggregated and analyzed

23 4

5

6

Review & Run Query

Review & Return Output

Data Partner 2

EnrollmentDemographics

UtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

https://www.sentinelinitiative.org/privacy-and-security 49

Page 50: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Example: A comparative effectiveness study

• A Scalable Partnering Network for CER (SPAN) project

• Risk of long-term re-hospitalization with lap band vs. bypass procedure

• Included 7 of 11 data partners

Toh et al, Med Care, 2014;52:664-668 50

Page 51: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Study setting

http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gastroenterology/laparoscopic_adjustable_gastric_banding_135,63/

http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gastroenterology/roux-en-y_gastric_bypass_weight-loss_surgery_135,65/

51

Page 52: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Study design

•≥21 years at time of bariatric surgery•≥1 BMI of 35kg/m2 or greater •Continuous enrollment w/ benefits•No prior bariatric surgery•No prior diagnosis of study outcome

1/1/2005

Time

Contributing person-times

12/31/2010Start of follow up (discharge date)

•Re-hospitalization•Death•Health plan disenrollment•12/31/2010•730 days of follow-up

365 days

Index bariatric hospitalization

Toh et al, Med Care, 2014;52:664-668 52

Page 53: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

ConfoundersAge Asthma*Sex Deep vein thrombosis*Race/ethnicity Pulmonary embolism*Diabetes* Congestive heart failure*Baseline BMI* Hyperlipidemia*Year of procedure Coronary artery disease*Charlson comorbidity score* Oxygen use*Atrial fibrillation* Assistive walking device*GERD* Smoking status*Hypertension* Blood pressure*Sleep Apnea* Length of stay assoc. with procedure

*Identified during the 365-day baseline period prior to the index bariatric hospitalization

Toh et al, Med Care, 2014;52:664-668 53

Page 54: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Statistical analysis

• Propensity score stratification

• Analysis• Pooled patient-level data analysis (benchmark)• Risk set-based analysis• PS-stratified analysis (by quintile)• Meta-analysis of site-specific effect estimates

Toh et al, Med Care, 2014;52:664-668 54

Page 55: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Selected baseline patient characteristicsCharacteristics Adjustable gastric band (n=1,550) Roux-en-y gastric bypass (n=5,792)

N %* N %*

Mean age (SD) 46.7 11.2 45.7 10.7

Age > 65 years 76 4.9 141 2.4

Female sex 1,266 81.7 4,823 83.3

Race/ethnicity

Black or African American 137 8.8 522 9.0

White 1,130 72.9 3,840 66.3

Hispanic 142 9.2 769 13.3

Other 62 4.0 280 4.8

Unknown 79 5.1 381 6.6

Baseline BMI

30-34.9 96 6.2 174 3.0

35-39.9 480 31.0 1,410 24.3

40-49.9 813 52.4 3,126 54.0

≥50 161 10.4 1,082 18.7

Toh et al, Med Care, 2014;52:664-668 55

Page 56: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Patient-level data analysis, by site

Site Adjusted HR 95% CISite 1 0.68 0.45, 1.02Site 2 0.65 0.37, 1.15Site 3 0.52 0.26, 1.04Site 4 0.72 0.35, 1.50Site 5 0.82 0.46, 1.48Site 6 0.32 0.13, 0.75Site 7 0.79 0.62, 1.01

Toh et al, Med Care, 2014;52:664-668 56

Page 57: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overall results by method

Toh et al, Med Care, 2014;52:664-668 57

Page 58: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Results by method

Method Adjusted HR 95% CI

Benchmark 0.71 0.59, 0.84

Risk set analysis 0.71 0.59, 0.84

PS stratification 0.70 0.59, 0.83

Meta-analysis 0.71 0.60, 0.84

Toh et al, Med Care, 2014;52:664-668 58

Page 59: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Pooled patient-level linear regression (from PROC REG)

Distributed linearregression

59

Page 60: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Pooled patient-level logistic regression (from PROC LOGISTIC)

Distributed logistic regression

60

Page 61: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Pooled patient-level Cox PH regression (from PROC PHREG)

Distributed Cox PHregression

61

Page 62: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Data-sharing methods in multi-database studies

Data shared across sites

Patient-level data

Individual covariates

Confounder summary scores

A hybrid of above

Summary-level data

Stratum-specific counts

Risk-set data

Intermediate statistics

Database-specific effect estimates

62

Page 63: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Data-sharing methods in multi-database studies

Data shared across sites

Patient-level data

Individual covariates

Confounder summary scores

A hybrid of above

Summary-level data

Stratum-specific counts

Risk-set data

Intermediate statistics

Database-specific effect estimates

63

Page 64: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Analytic flexibility vs. privacy protection

Privacy protection

Anal

ytic

flex

ibili

ty

Patient-level info

with individual covariates

Database-specific effect

estimates

Patient-level info

with summary

scores

Stratum-specific counts

Risk-set data

Summary statistics

* Approximation

64

Page 65: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Overview

• Data sharing in multi-center studies• Benefits and challenges

• Ways to facilitate data sharing while protecting privacy• Stakeholders’ views on data sharing• Use of distributed data networks• Use of privacy-protecting analytic and data-sharing methods

• Discussion

65

Page 66: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Discussion

• Stakeholders are willing to share data if:• Benefits of research outweigh the risks• Risks are minimized• Cost is reasonable

• Although we did not spend too much time on it here, proper governance on data sharing is critical

66

Page 67: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Discussion

• Use of distributed data network structure and privacy-protecting analytic methods allow analysis of multiple databases while protecting patient privacy

67

Page 68: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

A national DDN infrastructure for evidence generation

Health Plan 2

Health Plan 1

Health Plan 5

Health Plan 4

Health Plan 7 Hospital 1

Health Plan 3

Health Plan 6

Health Plan 8

Hospital 3Health Plan 9

Hospital 2

Hospital 4

Hospital 6

Hospital 5

Outpatient clinic 1

Outpatient clinic 3

Outpatient clinic 4

Outpatient clinic 6

Outpatient clinic 5

Outpatient clinic 2

• Each organization can participate in multiple networks• Each network controls its governance and coordination• Networks share infrastructure, analytics, lessons, security, software

68

Page 69: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Summary

• Sending analysis to the data

• Sharing information, not data

• Getting more by asking for less

69

Page 70: Preserving privacy in data sharing - AAMC · Data sharing in multi-center studies • Benefits and challenges • Ways to facilitate data sharing while protecting privacy • Stakeholders’

Acknowledgments• HPHCI

• Jeffrey Brown• Mia Gallagher• Qoua Her• Xiaojuan Li• Sarah Malek• Jessica Malenfant• Richard Platt• Yury Vilk• Jessica Young• Zilu Zhang

• Others• Susan Gruber• Bruce Fireman• Lingling Li• Kazuki Yoshida

• Penn State University• Aleksandra Slavković• Yuji Samizo

• PCORnet• David Arterburn • Jason Block• Jane Anau• Yates Coley• Casie Horgan• Kathleen McTigue• Erick Moyneur• Roy Pardee• Juliane Reynolds• Sheryl Rifas-Shiman• Jessica Sturtevant• Robert Wellman• Many others

70