Dealing with confidential research information - Anonymisation techniques and other measures to...

12
Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management and Sharing workshop Edinburgh, 17 June 2008

Transcript of Dealing with confidential research information - Anonymisation techniques and other measures to...

Page 1: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Dealing with confidential research information

-Anonymisation techniques and other

measures to enable using and sharing research data

Data Management and Sharing workshopEdinburgh, 17 June 2008

Page 2: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Using and sharing confidential

research data…obtained from people as participants

Requires a combination of:• discussing consent and confidentiality with

participants / respondents (dialogue)• anomymisation of data• user access restrictions

researchers only; use licence with confidentiality agreement; data unavailable for certain time period

What is required depends on:• nature of research• planned data uses• is study specific

Page 3: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Identity disclosure

A person’s identity can be disclosed through: • direct identifiers

name, address, postcode, telephone number, voice, picture

usually NOT essential research information (administrative)

• indirect identifiers – possible disclosure in combination with other information occupation, geography, unique or exceptional values

(outliers) or characteristics

Page 4: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Why anonymise data?

• Ethical reasons – protect identity (sensitive, illegal, confidential

info)– disguise research location

• Commercial reasons

• Legal reasons – protect personal data (DPA)

Page 5: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Essential points

• Never disclose personal data (unless specific consent)

• Reasonable / appropriate level of anonymity

• Maintain maximum meaningful info

• Where possible replace rather than remove

• Identifying info may provide context, do not over-anonymise

• Re-users of data have the same legal and ethical obligation to NOT disclose confidential info as primary users

Page 6: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Anonymising quantitative data

• Remove direct identifiersnames, address, institution

• Reduce the variable precision through aggregationpostcode sector vs full postcode, birth year vs date of birth, occupational categories

• Generalise meaning of text occupational expertise

• Restrict upper / lower ranges to hide outliersincome, age

Page 7: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Relational data

Extra care needed - combinations of related datasets or a dataset in combination with publicly available info can disclose information e.g. businesses studied are mapped in publication

Page 8: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Geo-referenced data

Point data may reveal position of individuals, organisations, businesses, etc.

• Remove point coordinates – loss of all geographical info

• Reduce precision - replace point coordinates with line or polygon of larger areakm2 area, postcode district, ward, road

• Reduce precision - replace point coordinate with meaningful variable typifying the geographical position catchment area, poverty index, population density

But: geo-referenced data are valuable for re-use. Maintaining geo-references and imposing access restrictions is better

Page 9: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Anonymising qualitative data

• Plan or apply editing at startanonymise during transcription, highlight sensitive info for

later anonymising• Except: longitudinal studies - anonymise when data

collection complete (linkages)• Avoid blanking out information• Use pseudonyms or codes• Removing or aggregating identifiers in text can distort

data, make them unusable and unreliable or misleading - avoid over-anonymising

• Consistency within research team and throughout project

• [bracket] replacements for clarity • XML mark-up can be used for anonymisation (TEI tag)

<seg type="anonymised">word to be anonymised</seg>

Page 10: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Tips

• Always consider anonymisation together with consent agreements and user access restrictions

• Regulating / restricting user access may offer a better solution than anonymising

• Remove, mask, change identifiers

• Maintain maximum information

• Create log of all anonymisations

• Keep copy of original data

• Plan at start of research, not at the end

Example: Anonymisation log interview transcripts

Interview / Page Original Changed toInt1p1 Spain European p1 E-print Ltd Printing p2 20th J une June

p2 Amy MoiraInt2p1 Francis my friend

Page 11: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Sources

• Clark, A. 2006. Anonymising research data. NCRM Working Paper Series 7/06. ESRC National Centre for Research Methods.[http://www.ncrm.ac.uk/research/outputs/publications/WorkingPapers/2006/0706_anonymising_research_data.pdf]

• Economic and Social Data Services (ESDS) guidelines, UK Data Archive

• Inter-University Consortium for Political and Social Research (ICPSR). 2005. Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. 3rd Edition. ICPSR, Ann Arbor.

• Timescapes meetings & discussions

Page 12: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Exercises / scenarios

• Anonymising qualitative data: – Foot & mouth study Cumbria 2001-2003 (5407)– Conflicts and violence in prison (4596)

• Anonymising quantitative data: Labour Force Survey• Confidential relational and geo-referenced data: British

Household Panel Survey