Research Ethics and Use of Restricted Access Data

21
Research Using Restricted Data: Ethics and Access Issues California Center for Population Research May 19, 2014

description

Presentation given to the California Center for Population Research on principles of research ethics, data management for protection of privacy and confidentiality, and applying for access to restricted data in social science research.

Transcript of Research Ethics and Use of Restricted Access Data

Page 1: Research Ethics and Use of Restricted Access Data

Research Using Restricted Data: Ethics and Access Issues

California Center for Population Research

May 19, 2014

Page 2: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 2

Goals for today’s presentation

• Discuss basic professional standards for researchers

• Discuss ways to manage data to protect privacy and confidentiality

• Using restricted-access data

Thanks to Shira Safir, UCLA School of Public Health for many of the slides

Page 3: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 3

Professional Standards for Researchers

• "For individuals, research integrity is an aspect of moral character and experience. It involves above all a commitment to intellectual honesty and personal responsibility for one’s actions and to a range of practices that characterize responsible research conduct." (Taken from the National Academies of Sciences Report)

Source: Slide shared by Shira Shafir, UCLA School of Public Health

Page 4: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 4

Professional Standards for Researchers• The best ethical practices produce the best research.• These practices include:

– Proficiency and fairness in peer review; – Accuracy and fairness in representing contributions to research

proposals and reports; – Collegiality in interactions, communications and sharing of

resources; – Honesty and fairness in proposing, performing, and reporting

research; – Disclosure of conflicts of interest; – Protection of human subjects in the conduct of research;

Source: Slide shared by Shira Shafir, UCLA School of Public Health

Page 5: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 5

Collaborative Research

• Today, advances in the research are rarely made by single investigators.

• Typically, collaboration allows the investigative team to ask powerful new questions, the answers to which would be otherwise unattainable.

Source: Slide shared by Shira Shafir, UCLA School of Public Health

Page 6: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 6

Conflicts of Interest

• “Conflict of Interest” is a legal term that encompasses a wide spectrum of behaviors or actions involving personal gain or financial interest.

• A conflict of interest exists when an individual exploits, or appears to exploit, his or her position for personal gain or for the profit of a member of his or her immediate family or household. Source: Slide shared by Shira Shafir, UCLA School of Public Health

Page 7: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 7

Misconduct in Research• Errors involving deception comprise the most

serious category of errors.• These types of errors involve:– Violate privacy and confidentiality of research

subjects– Making up data or results (fabrication)– Changing or misreporting data or results (falsification)– Using the ideas or words of another without giving

appropriate credit (plagiarism)• When in doubt, the government provides 26 excellent

guidelines: http://ori.hhs.gov/plagiarism-0Source: Slide shared by Shira Shafir, UCLA School of Public Health

Page 8: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 8

Manage data to protect respondents

• Assess disclosure risk: “A disclosure risk occurs if an unacceptably narrow estimation of a respondent’s confidential information is possible or if exact disclosure is possible with a high level of confidence.” http://neon.vb.cbs.nl/casc/Glossary.htm

• Procedures– Informed consent– Remove identifiers– Data manipulation– Restrict access

Page 9: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 9

Informed consent• “Informed consent is a process of communication

between a subject and researcher to enable the person to decide voluntarily whether to participate in a study. Human subjects involved in a project must participate willingly and be adequately informed about the research. The informed consent must include a statement describing how the confidentiality of subject records will be maintained. However, it also is important that informed consent be written in a way that does not unduly limit an investigator's discretion to share data with the research community.”

Source: ICPSR web site on Confidentiality

Page 10: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 10

Remove identifiers• Direct identifiers. These are variables that point explicitly to particular

individuals or units. Examples include:

– Names, Addresses, including ZIP and other postal codes– Telephone numbers, including area codes– Social Security numbers– Other linkable numbers such as driver's license numbers, certification numbers, etc.

• Indirect identifiers. These are variables that can be problematic as they may be used together or in conjunction with other information to identify individual respondents. Examples include:

– Detailed geographic information (e.g., state, county, province, or census tract of residence)– Organizations to which the respondent belongs– Educational institutions (from which the respondent graduated and year of graduation)– Detailed occupational titles– Place where respondent grew up– Exact dates of events (birth, death, marriage, divorce)– Detailed income– Offices or posts held by respondent

Source: ICPSR web site on Confidentiality

Page 11: Research Ethics and Use of Restricted Access Data

04/10/2023 11

Data Manipulation• Recoding -- can include converting dates to time intervals, exact dates of birth

to age groups, detailed geographic codes to broader levels of geography, and income to income ranges or categories.

• Removal -- eliminating the variable from the dataset entirely.• Top-coding -- restricting the upper range of a variable.• Collapsing and/or combining variables -- combining values of a single variable

or merging data recorded in two or more variables into a new summary variable.

• Sampling -- rather than providing all of the original data, releasing a random sample of sufficient size to yield reasonable inferences.

• Swapping -- matching unique cases on the indirect identifier, then exchanging the values of key variables between the cases. This retains the analytic utility and covariate structure of the dataset while protecting subject confidentiality. Swapping is a service that archives may offer to limit disclosure risk. (For more in-depth discussion of this technique, see O’Rourke, 2003 and 2006.)

• Disturbing -- adding random variation or stochastic error to the variable. This retains the statistical properties between the variable and its covariates, while preventing someone from using the variable as a means for linking records.Source: ICPSR Guide to Social Science Data Preparation and Archiving

Page 12: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 12

Restrict Access

• “Restricted-use data are distributed in cases when removing potentially identifying information would significantly impair the analytic potential of the data. In other cases, data contain highly sensitive personal information and cannot be shared as a public-use file. In these cases, ICPSR provides access to a restricted-use version that retains the confidential data but requires controlled conditions for accessing them.”

Source: ICPSR Web site on confidentiality

Page 13: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 13

Ways to restrict access• ICPSR has established several mechanisms by which restricted-use data can be

distributed:• Secure online analysis (publicly available): This option provides immediate access

to restricted-use data behind an analytic interface that has programmable disclosure protection.

• Secure online analysis (password protected): This option provides analysis of restricted-use data behind an interface with programmable disclosure protection for selected users. With this option, users may have to submit an application to access the data, or they may be part of a defined group, such as a research group.

• Restricted Use Data Agreement: With this option, users submit a request to access the data, and after approval, download the data using a single-use password or receive the data on CD-ROM.

• Virtual Data Enclave (VDE): The VDE is a secure, online environment via which approved users analyze restricted-use data using several software options available within the VDE, such as SAS, Stata, and SUDAAN.

• Physical Data Enclave: For highly restricted data, ICPSR has a physical enclave, which requires that approved users be on site at ICPSR to use the data. Data use in the physical data enclave is monitored by ICPSR staff.

Source: ICPSR Web site on confidentiality

Page 14: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 14

Apply for Restricted Data Access• Application for Restricted-Use Data. This includes information about the Investigator

and the research project that requires access to the restricted-use data. The application may require the current CVs of all researchers who will be working on the project.

• Confidential Data Security Plan. The fundamental goals of this plan are to ensure that the restricted-use data are securely stored at the institution and accessible only to the people listed in the request. For some data security plans, questions are presented in such a way for the Investigator to describe in detail how this responsibility will be met. The questions and answers combined comprise the confidential data security plan. For some data security plans, static terms are presented to which the requester must agree.

• Restricted Data Use Agreement. This is a legal agreement between the University of Michigan and the Investigator's institution specifying the terms of the use of the restricted-use data.

• Supplemental Agreement with Research Staff Form. This identifies every person other than the Investigator who will have access to the restricted-use data. New research staff added in the course of a project must be added to and sign this form before they can access the data.

• Pledge of Confidentiality. The Investigator and all research staff must sign this pledge before they can access the data.

Source: ICPSR web site on restricted data access

Page 15: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 15

Information needed to apply• Name, department, and title of the Investigator• Description of the proposed research that supports need to

access restricted-use data• Information on data formats needed and data-storage

technology• Approval or exemption for the research project from the

Institutional Review Board of the Investigator's organization (for some restricted-use data)

• A Restricted Data Use Agreement signed by the Investigator and a legal representative from the Investigator's institution

• Other required information as specified in the Restricted Data Use Agreement Source: ICPSR web site on restricted data access

Page 16: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 16

Add Health example

http://www.cpc.unc.edu/projects/addhealth/data/restricteduse/security

Page 17: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 17

National Center for Health Statistics - example http://www.cdc.gov/rdc/

Page 18: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 18

General Social Survey - example

http://publicdata.norc.org:41000/gss/documents//OTHR/ObtainingGSSSensitiveDataFiles.pdf

Page 19: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 19

Issues affecting access• Changes to the “Common Rule” – aka “The Federal Policy for the

Protection of Human Subjects” adopted by a number of federal agencies in 1991.

• Other uses of private data – Biosense http://www.cdc.gov/biosense/index.html

• Legislation – “Senators Intend to Amend Federal Student Privacy Law”

• Non-academic uses of personal data -- Top U.S. Retailers to Share Data in Fight on Cybercrime

Page 20: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 20

How can the Data Archive help?

• Review data and documentation to be deposited

• Advise on:– Data protection plan– Data management plan– Apply for restricted data access– Identify appropriate UCLA administrators to

handle agreements

Page 21: Research Ethics and Use of Restricted Access Data

04/10/2023 CCPR - May 19, 2014 21

Contact Us

• Social Sciences Data Archive• 1120-H Rolfe Hall• [email protected]• 310-825-0716