HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD...

24
HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School 1

Transcript of HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD...

Page 1: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and its Implications on Epidemiological Research Using

Large Databases

K. Arnold Chan, MD, ScDHarvard School of Public Health

Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School

1

Page 2: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Brief outline of this presentation

●Using large linked automated data for public health research

●Data development processes to ensure HIPAA-compliance

●Examples ●Some thoughts

Page 3: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Two types of data for public health research

● Primary data– Prospectively collected– Well-designed data collection tool– Informed consent

● Secondary data– Data originally collected for other purposes– May be proprietary– Privacy and confidentiality (particularly

important if no prior authorization)– Different data systems

Page 4: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Large linked healthcare databases

● Health insurance claims data– Medicaid– Medicare– Managed Care Organizations (MCO)

● Automated medical records● Hospital / Clinic IT systems● Availability of written records● Need to contact patients / individuals ?

Page 5: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Public health research within MCOs

● Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare)

● Kaiser Permanente (several states)● Group Health Cooperative (Seattle area)● Others● HMO Research Network

– 10+ MCOs across the U.S.

Page 6: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Public health research within MCOs

● Different types of MCOs– Group model– Staff model– Different relationship with hospitals– Implications on data access

● MCOs with research programs– Separate research departments– Full-time investigators and support staff

Page 7: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Data elements in the MCO data● Demographic information● Membership

– Start date, termination date, benefit plan, ...● Office visits

– Type of visit, diagnosis(es), special procedures● Special examinations

– Radiology, Laboratory examinations● Hospitalizations● Drug dispensings● Linkable by a unique ID

Page 8: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and Research with Databases

● Authorization from individual research subjects not feasible

● Individual authorization may be waived by Institutional Review Board or Privacy Board– Minimal Risk– Data reported in aggregate fashion

● No single-case report– “Minimum necessary” principle– De-identification

Page 9: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and Research with Databases

● Single MCO studies– Investigators and research staff are MCO

employees● Multiple-MCO studies

– May involve transferral of data across MCOs or to a Data Center

● Other types of studies not covered in this presentation– e.g. Generate a de-identified dataset for public or

commercial use

Page 10: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Do not move individual level data unless absolutely necessary– Generate summary tables at each study site– Combine the tables for final report– Smalley et al. Contraindicated use of

cisapride: the impact of an FDA regulatory action. JAMA 2000; 284: 3036-9.

Page 11: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,
Page 12: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Randomly generated Study ID to replace True ID– Crosswalk between the two stored at secured

location– Destroy the crosswalk after successful linkage of

data and quality check– Implications for storage and back-up

Page 13: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Roll-up / transform variables– Age --> Age groups– National Drug Code --> Drug or Group of drugs– ICD-9 diagnosis code --> Disease

e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg 55555-333-22– 65-70 y/o m with Heart Failure received Digoxin

Page 14: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Preserve temporal sequence of events

but disguise the real dates● e.g. Drug use during pregnancy study

– 29 year-old received 55555-333-22 on Nov 25, 1999 and delivered a baby on Dec 10, 1999

-->– 26-30 year-old mother delivered in 1999, baby

exposed to amoxicillin at -16 days

Page 15: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Only extract information relevant to the study– e.g. A study of osteoporosis does not require

information on subjects' mental health status● Co-morbid conditions may be relevant

– Use proxy measures to describe level of comorbidity

● Charlson's Index (based on concomitant diagnoses)● Chronic Disease Score (based on co-medications)

Page 16: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Geocoding– Describe social-economic status of study

subjects based on census tract data– Send out (Study ID, address) to a geocoding firm – (Study ID, X1, X2, X3) returned

● X1 : education level● X2 : income level● X3 : race/ethnicity information

Page 17: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

An example

Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns. Pediatrics 2003; 112: 620-7.● Data elements involved

– Date of birth, gender– Membership– Drug dispensings– Diagnoses in close proximity to antibiotics

dispensings● Data from nine MCOs

Page 18: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Finkelstein et al. Pediatric antibiotics use study

● Data development at each MCO– Extract antibiotics use information– Extract diagnosis of interest (infections)– Use date of birth, gender, and membership data

to calculate person-time of interest● Refined, aggregate data forwarded to the

Data Center– Rate of antibiotics use =

# of antibiotics use / 1,000 person-years

for each age-gender group

Page 19: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● Individual identification is needed for certain types of research– Obtain medical records– Contact patient to conduct interview and/or

request specimen– Linkage with external data

● Cancer registry● National Death Index

Page 20: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA and data development

● The process– Data extraction, transformation, reduction, and de-

identification carried out at each MCO– Governed by State laws and local HIPAA-compliant

Standard Operating Procedures– Principle of Limited Dataset / Minimum necessary

● The goal– Highly processed and de-identified data available for

concatenation across study sites and complex analyses

Page 21: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

k-anonymity and large datasets

● The goal– A de-identified dataset at a certain level of

individual anonymity

A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam

vs.

A man 40-45 taking a beta-blocker and a thiazolidenedione

Page 22: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

HIPAA, Data Storage and Access

● Implications on Data Backup Plans– Data need to be destroyed after the report is

published● Data only used to support pre-defined

analyses● Ancillary analysis are possible after IRB

review and approval

Page 23: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Epidemiology studies using large databases

● In the old days ...– Give me all the data, do what I say ...– What if the investigator / reviewer want to do

THIS analysis ?– Use existing datasets to test new hypothesis

● Good research practice– Define necessary data elements according to

research protocol– Pre-defined analytic plan

Page 24: HIPAA and its Implications on Epidemiological Research Using Large Databases K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory,

Epidemiology studies using large databases

● Keys to protection of human subjects– Competent, responsible investigators and staff– IRB review and oversight– Data development guidelines

● e.g. Good Epidemiology Practice– Information technology

● Some reasonable rules/guidelines are better than no guideline