Research Databases for NRES
description
Transcript of Research Databases for NRES
JHC roles
1. Research chair at UoN –epidemiology, risk prediction and drug safety
2. Member of the ECC NIGB3. Developed and run the not-for-profit
QResearch database with EMIS4. Inner city GP
Outline
BackgroundKey ethical issues
ScientificConfidentiality
Example of QResearchData linkage and pseudonymisationDiscussion /questions
Background
• Large volumes of electronic data now collected in the NHS
• Huge potential for useful research• Technology exists to extract data and
assemble it into databases• Databases popular with academics and DH
• Large numbers for studies• Relative efficiency• Increasing potential for data linkages
Definition research database in NRES SOP
• “a structured collection of individual level personal information, which is stored for potential research purposes beyond tehh life of a specific research project with defined end points”
• Includes databases set up for research• Re-use of databases established for • - audit• - disease registers
Research databases
• Included in NRES SOPs• Specific section within IRAS form• Approvals generally for 5 years renewable• Can include generic approval• Can include providing data to third parties as
part of a research service• Detailed protocol required on purpose,
operation, methods, policies, governance
New research databases
• What is the purpose?• Do we need a new one or can an existing
database be used?• Who is will ‘own’ it and be responsible for it?• What data will it contain and how will it be
accessed?• What is the governance framework ?• Will it contain identifiable data +/- consent?• ? S251 support required
Key objectives for safe data sharing
Patient and their
dataMinimise risk
Privacy
Maximise public benefit
Maintain public trust
Three main options for data access
Patient and their
dataMinimise risk
Privacy
Maximise public benefit
Maintain public trust
consentPseudonymisation
s251
De-identification
• Various methods to reduce identifiability of data
• Pseudonymisation• Use of samples and limited data items rather
than whole database• Conversion of dob to year of birth or age.• Contracts/data sharing agreements with clear
liabilities and penalities
Example for QResearch
• Established in 2002 to support ethical medical research
• Largest of three UK databases & expanding• Management board – UoN and EMIS.• Advisory board – professional and lay
representation. Advises on policy, strategy etc.
• Scientific Board – review science and risk assessment.
QResearch key facts
• Large pseudonymised database• >700 GP practices, 14 million patients• Patient and event level data• Demographics – year birth, sex, ethnicity• Diagnoses, Lab results, clinical values• Medication , referrals• No free text. No strong identifiers• All research peer reviewed & published.
QResearch uploads
• informed consent from practice• Practice displays notice in waiting room• Practice activates upload software• Data pseudonymised BEFORE data leaves
practice• Patients can be opted out of upload• Secure upload to server at EMIS with full
NHS security clearance• Backups delivered to University
QResearch - security
• Full database stored on off line server• Full encryption of hard drive• Key padded server room with limited access• 24 hour CCTV with monitoring• Confidentiality clauses in staff contracts• Full log of all data accesses• Log of all uses of data• No losses data or breaches in 10 years
QResearch policy
• Whilst all data are pseudonymised, we have same safeguard as it identifiable• To minimise any risks of re-identification patients
(and practices)• To maintain public and professional trust
• Explicit policy to ensure all results of research studies are widely and freely available for public benefit.
Researcher access
• University based academics • One must be GMC registered• Standard application form• Clarify research question and methods• Independent Scientific review• Provided with sample size and data items
needed to answer question • Data only used for agreed purpose • Data destroyed after project completed
Why is it important to ensure robust scientific methods
• Published research must give valid results which don’t mislead or misinform doctors, patients, policy makers
• Equally need to avoid unpublished research – eg a good study with important results • Avoid duplication effort• Avoid publication bias• Avoid suppression of unpopular results (eg side
effects medicines)
Ensuring scientific quality
1. Is there a clear research question?2. Can the data answer the question?3. Are the methods scientifically valid?4. Are the results likely to be generalisable?5. Does team have skills to do the project6. Is the researcher free to publish?• Some databases with generic REC
agreement will organise independent scientific review to answer the above.
Risk to confidentialty• Each study needs risk assessment even if
pseudonymised• Could the study lead to identification of the
patients because of • - other data that the researcher might have• - small numbers/rare events• Minimise risk by de-identification data• Data sharing agreement & sanctions for
misconduct .
QResearch data linkage study
• Linked to deprivation in 2002• Linked to ONS cause death in 2007• Currently being linked to HES and cancer
registry• Testing out new method of data linkage using
pseudonymised data linkage• Exceptionally high levels of valid, complete
NHS numbers for ONS data, HES, GP data
Open pseudonymiser project
• Need approach which doesn’t extract identifiable data but still allows linkage• Legal ethical and NIGB approvals• Secure, Scalable• Reliable, Affordable• Generates ID which are Unique to project• Can be used by any set of organisations wishing
to share data• Pseudoymisation applied as close as possible to
identifiable data ie within clinical systems
Pseudonymisation: method
• Scrambles NHS number BEFORE extraction from clinical system• Takes NHS number + project specific encrypted ‘salt
code’• One way hashing algorithm (SHA2-256) – no collisions
and US standard from 2010• Applied twice - before leaving clinical system & on
receipt by next organisation• Apply identical software to second dataset• Allows two pseudonymised datasets to be linked• Cant be reversed engineered
Web tool to create encrypted salt: proof of concept
• Web site private key used to encrypt user defined project specific salt
• Encrypted salt distributed to relevant data supplier with identifiable data
• Public key in supplier’s software to decrypt salt at run time and concatenate to NHS number (or equivalent)
• Hash then applied • Resulting ID then unique to patient within project•
Openpseudonymiser.org
• Website• Desktop application • Software for integration • Test data• Documentation• Utility to generate encrypted salt codes • Source code GNU GPL
Progress so far
• Pseudonymised entired • HES database since 1997• Cause of death data since 1993• Cancer registrations since 1990
• Linked all three datasets based only on pseudo NHS number - >99% complete
• Due to linked GP data Spring 2012• Implementing into major GP computer
systems
Key points
• Pseudonymisation at source• Instead of extracting identifiers and storing
lookup tables/keys centrally, then technology to generate key is stored within the clinical systems
• Use of project specific encrypted salted hash ensures secure sets of ID unique to project
• Full control of data controller• Can work in addition to existing approaches• Open source technology so transparent & free
Definition of clinical care team
• Important as determines whether s251 required
• Tendency by research community to adopt v broad definition to justify access
• Definition is tricky as a guide• Individual has a duty of care to patient• Has duty of confidence• Would be recognised in that role by a reasonable
patient