Interpreting Kappa in Observational Research: Baserate Matters
-
Upload
teagan-vazquez -
Category
Documents
-
view
17 -
download
3
description
Transcript of Interpreting Kappa in Observational Research: Baserate Matters
![Page 1: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/1.jpg)
Interpreting Kappa in Observational Research:
Baserate Matters
Cornelia Taylor BrucknerVanderbilt University
![Page 2: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/2.jpg)
Acknowledgements
• Paul Yoder• Craig Kennedy• Niels Waller• Andrew Tomarken• MRDD training grant• KC Quant core
![Page 3: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/3.jpg)
Overview
• Agreement is a proxy for accuracy• Agreement statistics 101
Chance agreement Agreement matrix Baserate
• Kappa and baserate, a paradox• Estimating accuracy from kappa• Applied example
![Page 4: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/4.jpg)
Framing as observational coding
• I will be framing the talk today within observational measurement but the concepts apply to many other situations e.g., Agreement between clinicians on
diagnosis Agreement between reporters on child
symptoms (e.g. mothers and fathers)
![Page 5: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/5.jpg)
“Rater accuracy”: A fictitious session
• Madeline Scientist writes a script for an interval coded observation session where the Presence or absence of target behavior in interval
• Two coders (Eager Beaver and Slack Jack), blind to the script, are asked to code the session.
• Accuracy of each coder with the script is calculated
![Page 6: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/6.jpg)
Accuracy of Eager Beaver (EB) with session (interval data)
OccurrenceEager Beaver
NonoccurrenceEager Beaver
occurrences True
.90 .10
nonoccurrence True
.01 .99
![Page 7: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/7.jpg)
Accuracy of Slack Jack (SJ) with session (interval data)
occurrenceSlack Jack
nonoccurrenceSlack Jack
occurrence True
.50 .50
nonoccurrence True
.30 .70
![Page 8: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/8.jpg)
Who has the best accuracy?
• Eager Beaver of course.• Slack Jack was not very accurate • Notice that accuracy is about
agreement with the occurrence and nonoccurrence of behavior.
![Page 9: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/9.jpg)
We don’t always know the truth
• It is great when we know the true occurrence and nonoccurrence of behaviors
• But, in the real world we deal with agreement between fallible observers
![Page 10: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/10.jpg)
Agreement between raters
• Point by point interobserver agreement is achieved when independent observers : see the same thing (behavior, event) at the same time
![Page 11: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/11.jpg)
Difference between agreement and accuracy
• Agreement can be directly measured.• Accuracy can not be directly measured.
We don’t know the “truth” of a session.
• However, agreement is used as a proxy for accuracy
• Accuracy can be estimated from agreement The method for this estimation is the focus of
today’s talk
![Page 12: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/12.jpg)
Percent agreement
• Percent agreement The proportion of intervals that were agreed
upon Agreements/agreements+disagreements Takes into account occurrence and
nonoccurrence agreement Varies from 0-100%
![Page 13: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/13.jpg)
Occurrence and Nonoccurrence agreement
• Occurrence agreement The proportion of intervals that either coder
recorded the behavior that were agreed upon Positive agreement
• Non-occurrence agreement The proportion of intervals that either coder
recorded a nonoccurrence that were agreed upon
Negative agreement
![Page 14: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/14.jpg)
Problem with agreement statistics
• We assume that agreement is due to accuracy
• Agreement statistics do not control for chance agreement
• So agreement could be due only to chance
![Page 15: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/15.jpg)
Chance agreement and point by point agreement
Value of IOA statistics when true accuracy is 80%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
10% 20% 30% 40% 50% 60% 70% 80%
% of intervals during which the behavior occured
value of IOA statisticOccurrence agreement
Nonoccurrence agreement
![Page 16: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/16.jpg)
Agreement matrix
Slack Jack
EagerBeaver
happy sad angry puzzled other
happy 60 5 1 1 3 70
sad 1 40 4 2 0 47
angry 0 3 12 0 7 22
puzzled 5 5 4 30 6 50
other 0 0 0 1 10 11
73 53 21 34 19 200
![Page 17: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/17.jpg)
Using a 2x2 table to check agreement on individual codes
• When IOA is computed on the total code set it is an omnibus measure of agreement
• This does not inform us on agreement on any one code.
• To know agreement on a particular code the confusion matrix needs to be collapsed into a 2x2 matrix.
![Page 18: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/18.jpg)
Eager Beaver
Slack Jack
happy sad angry puzzled other
happy 60 9 1 0 0 70
sad 6 40 0 1 0 47
angry 0 7 12 2 1 22
puzzled 0 4 3 30 13 50
other 1 0 0 1 10 11
67 60 16 39 24 200
![Page 19: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/19.jpg)
Eager Beaver
Slack Jack happy All other emotions
happy 60 10 70
All other emotions
7 123 130
67 133 200
![Page 20: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/20.jpg)
Baserate in A 2x2 table
20067
1237All other emotions
701060Happy
All other emotions
HappySlackJack
Eager Beaver
(67+70)/(2*200)= .34
![Page 21: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/21.jpg)
Review
• Defined accuracy• Described the relationship between
chance agreement and IOA• Creating a 2x2 table• Calculating a best estimate of the
base rate
![Page 22: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/22.jpg)
Kappa
• Kappa is an agreement statistic that controls for chance agreement
• Before kappa there was a sense that we should control for chance but we did not know how
• Cohen’s 1960 paper has been cited over 7000 times
![Page 23: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/23.jpg)
Definition of Kappa
• Kappa is the proportion of non-chance agreement observed out of all the non-chance agreement
K = Po-Pe
1 - Pe
![Page 24: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/24.jpg)
Definition of Terms
• Po= The proportion of events for which there is observed agreement. Same metric as percent agreement
• Pe= The proportion of events for which agreement would be expected by chance alone Defined as the probability of two raters
coding the same behavior at the same time by chance
![Page 25: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/25.jpg)
Agreement matrix for EB and SJ with (chance
agreement)HappyEager Beaver
All other emotionsEager Beaver
HappySlack jack
.36 (.33) .36 .72
All other emotionsSlack Jack
.09 .18 (.15) .28
.46 .54
Po = .36+.18; Pe = .33 + .15; k = (.54-.48)/(1-.48)=.12
![Page 26: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/26.jpg)
What determines the value of kappa
• Accuracy and base rate• Increasing accuracy increases
observed agreement therefore: kappa is a consistent estimator of accuracy if base rate is held constant
• If accuracy is held constant, kappa will decrease as the estimated true base rate deviates from .5
![Page 27: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/27.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
baserate
obtained kappa
Obtained kappa, across baserate, for 80% accuracy
Accuracy 80%
![Page 28: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/28.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Baserate
Obtained kappa
Obtained kappa, across baserate, for 80% and 99% accuracy
Accuracy = 80%
Accuracy = 99%
![Page 29: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/29.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9baserate
obtained kappa
Obtained kappa, across baserate, from 80% to 99% accuracy
Accuracy=80%
Accuracy=85%
Accuracy=90%
Accuracy=95%
Accuracy=99%
![Page 30: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/30.jpg)
Bottom line
• When we observe behaviors that are High or Low baserate our kappa’s will be low.
• This is important for researchers studying low baserate behaviors Many of the behaviors we observe in
young children with developmental disabilities are very low baserate
![Page 31: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/31.jpg)
Criterion values for IOA
• Cohen never suggested using criterion values for kappa
• Many professional organizations recommend criterions for IOA
• e.g., The Council for Exceptional Children: Division for Research Recommendations 2005 “ Data are collected on the reliability or inter-observer
agreement (IOA) associated with each dependent variable, and IOA levels meet minimal standards (e.g., IOA = 80%; Kappa = .60)”
![Page 32: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/32.jpg)
Criterion accuracy?
• Setting a criterion for kappa independent of baserate is not useful
• If we can estimate accuracy And I am suggesting that we can
• We need to consider what sufficient accuracy would be
![Page 33: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/33.jpg)
Criterion accuracy cont.
• If we consider 80% agreement sufficient than Would we consider 80% accuracy
sufficient?• If we used 80% accuracy as a
criterion Acceptable kappa could be as low as .19
depending on baserate
![Page 34: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/34.jpg)
Why it is really important not to use criterion kappas
• There is a belief that the quality of data will be higher if kappa is higher.
• This is only true if there is no associated loss of content or construct validity.
• The processes of collapsing and redefining codes often result in a loss of validity.
![Page 35: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/35.jpg)
Applied example
• See handout for formulas and data
![Page 36: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/36.jpg)
Baserate Kappa Accuracy.5 .81.9 .39.3 .48.7 .2.1 .7
Use the table on the first page of your handout to determine the accuracy of raters from baserate and kappa
![Page 37: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/37.jpg)
Observer 2 Intervals Intervals not engaged Engaged or other Intervals .8 .1 .9Observer 1 engaged Intervals not .05 .05 .1 Engaged or other .85 .15 1
P0 .85Pe .78Bas e ra te .88KappaAccu rac y (s ee t ab lePg 1 )Ca lcu la ti on s for Exa m ple 1
Kap p a = ( . 85- .78 ) /( 1 - .78)
.32.85
![Page 38: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/38.jpg)
Recommendations
• Calculate agreement for each code using a 2x2 table
• Use the table to determine the accuracy of observers from baserate and obtained kappa
• Report kappa and accuracy
![Page 39: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/39.jpg)
Software to calculate kappa
• Comkappa, Developed by Bakeman to calculate kappa, SE of kappa, kappa max, and weighted kappa.
• MOOSES, Developed by Jon Tapp. Calculates kappa on the total code set and individual codes. Can be used with live coding, video coding, and transcription.
• SPSS
![Page 40: Interpreting Kappa in Observational Research: Baserate Matters](https://reader030.fdocuments.in/reader030/viewer/2022032607/5681304c550346895d95f8d4/html5/thumbnails/40.jpg)
Challenge
• The challenge is to change the standards of observational research that demand kappa's above a criteria of .6 Editors PI’s Collaborators