Measuring reliability and validity in human coding and machine classification
-
Upload
stuart-shulman -
Category
Education
-
view
526 -
download
2
description
Transcript of Measuring reliability and validity in human coding and machine classification
![Page 1: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/1.jpg)
Measuring Reliability and Validity in Human Coding and Machine Classifica9on
Dr. Stuart Shulman
May 2, 2014 CAQDAS Conference 2014
“…a wealth of informa0on creates a poverty of a6en0on.” -‐ Herbert Simon, 1971
![Page 2: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/2.jpg)
![Page 3: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/3.jpg)
• This research has been supported by grants from the NaGonal Science FoundaGon (NSF) and was supplemented through interagency agreements between the US Environmental ProtecGon Agency, the US Fish & Wildlife Service, and the NSF. – EIA 0089892 (2001-‐2002)
v “SGER CiGzen Agenda-‐SeVng in the Regulatory Process: Electronic CollecGon and Synthesis of Public Commentary”
– EIA 0327979 (2003-‐2004) v “SGER CollaboraGve: A Testbed for eRulemaking Data”
– SES 0322662 (2003-‐2005) v “Democracy and E-‐Rulemaking: Comparing TradiGonal vs. Electronic Comment from a
Discursive DemocraGc Framework” – IIS 0429293 (2004-‐2007)
v “CollaboraGve Research: Language Processing Technology for Electronic Rulemaking” – SES-‐0620673 (2007)
v “Coding across the Disciplines: A Project-‐Based Workshop on Manual Text AnnotaGon Techniques”
– IIS-‐0705566 (2007-‐2010) v “CollaboraGve Research III-‐COR: From a Pile of Documents to a CollecGon of InformaGon:
A Framework for MulG-‐Dimensional Text Analysis”
• Any opinions, findings and conclusions or recommenda9ons expressed in this material are those of the authors and do not necessarily reflect those of the Na9onal Science Founda9on
Acknowledgements
![Page 4: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/4.jpg)
An Incredibly Important Book
![Page 5: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/5.jpg)
Qualita9ve Methods: Genes, Taste, or Tac9c? • Qualita9ve by birth or choice?
– Some look to words as an alternaGve to number crunching – Others rooted in rich and meaningful interpreGve tradiGons
• Another group is fluent in both qual & quant – Mixed methods open up rather than limits fields of knowledge
• One central goal is valid inferences about phenomena – Replicable and transparent methods – AbenGon to error and correcGve measures – Internal and external validaGon of results
• Using computers for qualita9ve data analysis helps, but… – Rigor sGll originates with the research design, not the technology – Socware makes beber organizaGon and efficiency possible – Coders enable the researcher to step back while scaling up
![Page 6: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/6.jpg)
Purist Pluralist Posi9vist
A spectrum of approaches to working with qualita9ve data Different types of knowledge claims depending where you sit
deep immersion closeness to data
anGpathy to numbers credible interpretaGon
in-‐depth analysis contextual subjecGve
experimental mixed method adapGve hybrid flexible approach interdisciplinary
quanGtaGve focus on error
measurement criGcal validity and reliability
replicaGon & objecGvity generalizaGon hypotheses
These choices philosophical, ideological, poli9cal and ethical
![Page 7: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/7.jpg)
Emergent proper9es found in a very well read texts, such as the character type “extremist agent of the law”
![Page 8: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/8.jpg)
Agenda-‐secng in the press
![Page 9: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/9.jpg)
Rela9ons between Classes
Rates and Terms for Credit
Farm Profitability
Cost of Living
Soil Fer9lity
Educa9on
Explora9on Specula9on Coding
Valida9on
![Page 10: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/10.jpg)
Skip Ahead 10 Years: Display Ideas Using IR & NLP Techniques
• Informa9on Retrieval (IR) – Search and cluster topics and cross-‐
correlate by stakeholders
• Natural Language Processing (NLP) – Grouped by opinion and writer type
Con Pro
25,000
20,000
15,000
10,000
5,000
Par 2.2(a1) Ø Con:
ü 150, 818: “impossible to maintain” ü 272: “too expensive for elderly”
Ø Pro: ü 169, 213, 391, 392, 394: “already being done in Alaska”
ü 18: “extend to children”
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx xx xxxx xxx Xxx xxx xxxxxxx x xxx xx x Xx xx xxxx x
Xxx xx xxx xx x xxx x xxx Xx xxxx x xxx x xxxxxxx x Xxxxx x xx xxxx x xx x Xx xx xxxx x
![Page 11: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/11.jpg)
Stuart W. Shulman. 2003. "An Experiment in Digital Government at the United States Na9onal Organic Program," Agriculture and Human Values 20(3), 253-‐265.
![Page 12: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/12.jpg)
![Page 13: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/13.jpg)
![Page 14: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/14.jpg)
![Page 15: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/15.jpg)
![Page 16: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/16.jpg)
![Page 17: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/17.jpg)
![Page 18: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/18.jpg)
Coding Web Sites and Focus Groups to Study Agenda-‐Secng
![Page 19: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/19.jpg)
Annota9on to Improve Op9cal Character Recogni9on
![Page 20: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/20.jpg)
Over 13,000 hours of video and audio were recorded of the public spaces in a LTC facility’s demenGa unit in suburban Pibsburgh, PA. A codebook of 80+ codes was developed to categorize the behavior of the consenGng residents and staff (only in relaGon to paGents). 22 coders spent more than 4,400 hours over a period of 22 months coding the video data. The data were coded using the Informedia Digital Video Library (IDVL), an interface designed by computer scienGsts at Carnegie Mellon University.
![Page 21: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/21.jpg)
![Page 22: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/22.jpg)
hjp://cat.ucsur.pij.edu
![Page 23: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/23.jpg)
![Page 24: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/24.jpg)
![Page 25: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/25.jpg)
![Page 26: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/26.jpg)
![Page 27: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/27.jpg)
![Page 28: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/28.jpg)
![Page 29: Measuring reliability and validity in human coding and machine classification](https://reader036.fdocuments.in/reader036/viewer/2022081401/5596df761a28ab344e8b45d1/html5/thumbnails/29.jpg)
Dr. Stuart W. Shulman Founder & CEO, Texicer, LLC Research Associate Professor, Department of PoliGcal Science University of Massachusebs Amherst Director, QualitaGve Data Analysis Program (QDAP) Associate Director, NaGonal Center for Digital Government Editor Emeritus, Journal of Informa0on Technology & Poli0cs [email protected] hbp://people.umass.edu/stu/ @stuartwshulman