Measuring reliability and validity in human coding and machine classification

Measuring Reliability and Validity in Human Coding and Machine Classifica9on

Dr. Stuart Shulman

May 2, 2014 CAQDAS Conference 2014

“…a wealth of informa0on creates a poverty of a6en0on.” -‐ Herbert Simon, 1971

•  This research has been supported by grants from the NaGonal Science FoundaGon (NSF) and was supplemented through interagency agreements between the US Environmental ProtecGon Agency, the US Fish & Wildlife Service, and the NSF. –  EIA 0089892 (2001-‐2002)

v “SGER CiGzen Agenda-‐SeVng in the Regulatory Process: Electronic CollecGon and Synthesis of Public Commentary”

–  EIA 0327979 (2003-‐2004) v “SGER CollaboraGve: A Testbed for eRulemaking Data”

–  SES 0322662 (2003-‐2005) v “Democracy and E-‐Rulemaking: Comparing TradiGonal vs. Electronic Comment from a

Discursive DemocraGc Framework” –  IIS 0429293 (2004-‐2007)

v “CollaboraGve Research: Language Processing Technology for Electronic Rulemaking” –  SES-‐0620673 (2007)

v  “Coding across the Disciplines: A Project-‐Based Workshop on Manual Text AnnotaGon Techniques”

–  IIS-‐0705566 (2007-‐2010) v “CollaboraGve Research III-‐COR: From a Pile of Documents to a CollecGon of InformaGon:

A Framework for MulG-‐Dimensional Text Analysis”

•  Any opinions, findings and conclusions or recommenda9ons expressed in this material are those of the authors and do not necessarily reflect those of the Na9onal Science Founda9on

Acknowledgements

An Incredibly Important Book

Qualita9ve Methods: Genes, Taste, or Tac9c? •  Qualita9ve by birth or choice?

–  Some look to words as an alternaGve to number crunching –  Others rooted in rich and meaningful interpreGve tradiGons

•  Another group is fluent in both qual & quant –  Mixed methods open up rather than limits fields of knowledge

•  One central goal is valid inferences about phenomena –  Replicable and transparent methods –  AbenGon to error and correcGve measures –  Internal and external validaGon of results

•  Using computers for qualita9ve data analysis helps, but… –  Rigor sGll originates with the research design, not the technology –  Socware makes beber organizaGon and efficiency possible –  Coders enable the researcher to step back while scaling up

Purist Pluralist Posi9vist

A spectrum of approaches to working with qualita9ve data Different types of knowledge claims depending where you sit

deep immersion closeness to data

anGpathy to numbers credible interpretaGon

in-‐depth analysis contextual subjecGve

experimental mixed method adapGve hybrid flexible approach interdisciplinary

quanGtaGve focus on error

measurement criGcal validity and reliability

replicaGon & objecGvity generalizaGon hypotheses

These choices philosophical, ideological, poli9cal and ethical

Emergent proper9es found in a very well read texts, such as the character type “extremist agent of the law”

Agenda-‐secng in the press

Rela9ons between Classes

Rates and Terms for Credit

Farm Profitability

Cost of Living

Soil Fer9lity

Educa9on

Explora9on Specula9on Coding

Valida9on

Skip Ahead 10 Years: Display Ideas Using IR & NLP Techniques

•  Informa9on Retrieval (IR) –  Search and cluster topics and cross-‐

correlate by stakeholders

•  Natural Language Processing (NLP) –  Grouped by opinion and writer type

Con Pro

25,000

20,000

15,000

10,000

5,000

Par 2.2(a1) Ø Con:

ü 150, 818: “impossible to maintain” ü 272: “too expensive for elderly”

Ø Pro: ü 169, 213, 391, 392, 394: “already being done in Alaska”

ü 18: “extend to children”

Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x

Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx xx xxxx xxx Xxx xxx xxxxxxx x xxx xx x Xx xx xxxx x

Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x

Stuart W. Shulman. 2003. "An Experiment in Digital Government at the United States Na9onal Organic Program," Agriculture and Human Values 20(3), 253-‐265.

Coding Web Sites and Focus Groups to Study Agenda-‐Secng

Annota9on to Improve Op9cal Character Recogni9on

Over 13,000 hours of video and audio were recorded of the public spaces in a LTC facility’s demenGa unit in suburban Pibsburgh, PA. A codebook of 80+ codes was developed to categorize the behavior of the consenGng residents and staff (only in relaGon to paGents). 22 coders spent more than 4,400 hours over a period of 22 months coding the video data. The data were coded using the Informedia Digital Video Library (IDVL), an interface designed by computer scienGsts at Carnegie Mellon University.

hjp://cat.ucsur.pij.edu

Dr. Stuart W. Shulman Founder & CEO, Texicer, LLC Research Associate Professor, Department of PoliGcal Science University of Massachusebs Amherst Director, QualitaGve Data Analysis Program (QDAP) Associate Director, NaGonal Center for Digital Government Editor Emeritus, Journal of Informa0on Technology & Poli0cs [email protected] hbp://people.umass.edu/stu/ @stuartwshulman

Measuring reliability and validity in human coding and machine classification

Education

Transcript of Measuring reliability and validity in human coding and machine classification