Introduction to ESDS Qualidata: Creating and delivering re-usable qualitative data Libby Bishop and...

Post on 28-Mar-2015

219 views 0 download

Tags:

Transcript of Introduction to ESDS Qualidata: Creating and delivering re-usable qualitative data Libby Bishop and...

Introduction to ESDS Qualidata:

Creating and delivering re-usable qualitative data

Libby Bishop and Louise CortiESDS Qualidata

RC33 AmsterdamAugust 2004

ESDS Qualidata

Qualitative data collections

• data from National Research Council (ESRC) individual and programme research grant awards

• data from ‘classic’ social science studies

• other funders/sources

• focus on DIGITAL Collections, but also facilitate paper-based archiving

Types of qualitative data

• diverse data types: in-depth interviews ; semi-structured interviews; focus groups; oral histories; mixed methods data; open-ended survey questions; case notes/records of meetings; diaries/ research diaries

• multimedia: audio, video, photos and text (most common is interview transcriptions)

• formats: digital, paper, analogue audio-visual

Classic datasets

• Peter Townsend – Poverty, old ageand Katherine Buildings

• Paul Thompson – oral history and Edwardians, social mobility

• Mildred Blaxter – Mothers and Daughters

• National Social Policy and Social Change Archive

Diverse uses for existing data

• enrich context description

• how was it really done documentation of methods– team ‘discussions’ about coding– what, exactly, is ‘semi-structured’?

• augment data you collect– historical comparative case– expand sample size

• datasets for teaching

Are data always re-usable?

• restrictions on secondary analysis

• accessible

• coherent

• format– medium– layout

• processing before delivery

Good archiving = good research

• thorough documentation

• well organised and labelled files

• major stages of research recorded

• consent, copyright and related issues clarified

Characteristics of a good archived research collection

• intellectual content

• extensive raw data created

• supporting documentation

• consent

• transcription

• identifiers removed

• data listing

Intellectual content

• builds on previous research

• addresses new issues

• innovative approach to discipline

• innovative approach to qualitative methodology

Extensive raw data

• types of research data assembled

– in-depth interviews– focus groups– field notes/participant observation– case study notes

• images and sound recordings

• range of material – broad focus

Describing qualitative data

• Full catalogue record

• Data listing (ID, biog details, date of interviews, media, format, transcript details)

• Online PDF User Guide

• Use/ processing notes

• Archival listing for large collections

• CAT RECORD

Supporting documentation• examples

– funding application– description of methodology– communication with informants on confidentiality– coding schemes/themes– technical details of equipment – interview schedules– end of award report– documentation from CAQDAS software packages, e.g.

analytical memos– bibliographies, resulting publications

• Anything that adds insight or aids understanding and re-use

• USER GUIDE HERE

User Read FileUK DATA ARCHIVE DOCUMENTATION 4594 - Policing, Cultural Change and 'Structures of Feeling in Post-War England, 1945-1999

Access conditions Until 1 May 2008, the depositor's permission must be sought for access - please contact Qualidata at UKDA for further details. Users should note that no access at all is permitted to the Metropolitan Police Commissioner's interview transcript (int54) until 31st January 2005. Conversion of data and documentation formats All 65 interview/focus group transcript files were converted to both MS WORD 97 and rich text formats. Both the MS WORD 97 and the rich text files are available to users. The hard copy documentation was scanned and is available as a one volume Acrobat PDF user guide. Anonymisation Some limited edits have been made to interview transcripts during processing to protect the identity of respondents. Care has been taken to ensure that this does not compromise the quality of information available. Data and documentation problems There are some spelling mistakes in the interview transcriptions, (left in situ due to limited processing resources), and the format transfer to Word has produced odd characters within the files in a very few cases. These issues should not present problems for secondary users. Notes from data delivery and post-order corrections

Transcribing research 1

• integrated into the ongoing research

• full transcriptions or summaries

• avoid stockpiling

• costs and benefits– self transcription– internal team transcription– external transcription

Transcribing research 2

• budget

– estimated number of interviews x 4 hours x 60 minute tape x hourly salary

• examples of good and bad

• full transcriptions– consistent layout– speaker tags– line breaks– header with identifier other details – checked for errors

Example of good transcription

LP: And how long have you lived in this house?

4G: This house? Four years past April.

LP: And you said you were in, was it Ferrier?

4G: Ferrier Gardens.

LP: For twenty years?

4G: Twenty-four years. Twenty-two doon the stair, and two years up the stair.

Identifiers removed

• confidentiality respected

• anonymisation?

• problems of anonymisation– applied too weakly– applied to strongly– timing – potential for distortion– examples

• user undertakings

• appropriate and sympathetic

Listing research data

• contents

• key elements– general– specific to project

• template approach

• point of entry

• DATA LIST HERE – EDWARD?

Value of data properly prepared for re-use

• widely disseminated and accessible

• suitable formats for use and preservation

• coherent data and methodology

• appropriate for CAQDAS packages

Preparing qualitative data for sharing

•Sharing requires standards –XML mark-up

•Processing steps:

•Scan

•Optical character recognition (OCR)

•Proof

•Format

•XML mark-up

XML mark-up enables• Access to content and structure

– Speaker tags– Coded textual/audio data– Links to contextual documentation

• Audio files; fieldnotes; photos; analytical annotations etc– Links to other sources via geo-referencing

• Micro data; aggregate statistics; maps; census data etc.• Data providers to publish to online systems, such as ESDS

Qualidata Online • Meet needs of researchers requesting a standard they can

follow• Encourage more qualitative data analysis software (CAQDAS)

companies to pursue XML-outputs based on this standard

How we get from tifs to…

…XML mark-up ready for online<u n=“31”>…<s n="44"> My father was, in the daytime he was a

boilermaker on the old <name type="organisation">North <add place="supralinear">Staffordshire</add><del type="word change">Circular</del>Railway</name> and then every night he played in the theatre orchestra.

</s>…<s n="46">And he

<add place="supralinear">'d to go to</add><del>had got to be at</del> work at six the next morning! <note place="end of paragraph">Cornet player.</note>

</s></u>

Word doc created from OCR

Issues in scanning and OCR

• Scanning done at 300 dpi, grey scale

• OCR varies hugely with quality of original, special challenges include (but are not limited to):

– Character recognition

– Stray marks on page

– Missing words

– Interviewer’s notes

– “Creative” character interpretation: section breaks, font changes, footnotes, super- and sub-scripts, and so on.

• Partially automated with macros, but much judgement (clerical and research) still required

Final Word file(human and Excel readable)

Using Excel macros to create XML transcript

Current final product:basic XML mark-up

<u id="96" who="subject">I would rather nae ken if I had cancer. I told my man that, I says "If I have cancer, don't tell me". I mean you might hae an idea yourself, but I wouldnae like to be telt. I told him that.</u>

<u id="97" who="interviewer">And how has your own health been over the years?</u>

<u id="98" who="subject">Och, up an' doon, y'ken .</u>

Need for publishing tools

• Once XML schema is more developed, next step is to develop publishing tools to automate as much of mark-up as possible

• Currently using simple scripts to find and mark <u> and <s>; much work still done manually

• Looking into options for automatic mark-up of some components (e.g. natural language processing and information extraction)

• Would like to work closer with CAQDAS suppliers to ensure use of similar mark-up semantics

ESDS Qualidata

• Contact

– ebishop@essex.ac.uk– corti@essex.ac.uk.uk

– www.esds.ac.uk/qualidata