OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Open Journal of Statistics 2024

Interrater Reliability Estimation via Maximum Likelihood for Gwet’s Chance Agreement Model

DOI: 10.4236/ojs.2024.145021, PP. 481-491

Alek M. Westover, Tara M. Westover, M. Brandon Westover

Keywords: Interrater Reliability, Agreement, Reliability, Kappa

Full-Text Cite this paper Add to My Lib

Abstract:

Interrater reliability (IRR) statistics, like Cohen’s kappa, measure agreement between raters beyond what is expected by chance when classifying items into categories. While Cohen’s kappa has been widely used, it has several limitations, prompting development of Gwet’s agreement statistic, an alternative “kappa”statistic which models chance agreement via an “occasional guessing” model. However, we show that Gwet’s formula for estimating the proportion of agreement due to chance is itself biased for intermediate levels of agreement, despite overcoming limitations of Cohen’s kappa at high and low agreement levels. We derive a maximum likelihood estimator for the occasional guessing model that yields an unbiased estimator of the IRR, which we call the maximum likelihood kappa ( $κ_{M L}$ ). The key result is that the chance agreement probability under the occasional guessing model is simply equal to the observed rate of disagreement between raters. The $κ_{M L}$ statistic provides a theoretically principled approach to quantifying IRR that addresses limitations of previous $κ$ coefficients. Given the widespread use of IRR measures, having an unbiased estimator is important for reliable inference across domains where rater judgments are analyzed.

References

[1]	Gwet, K.L. (2014) Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement among Raters. Advanced Analytics, LLC.
[2]	Cicchetti, D.V. and Feinstein, A.R. (1990) High Agreement but Low Kappa: II. Resolving the Paradoxes. Journal of Clinical Epidemiology, 43, 551-558. https://doi.org/10.1016/0895-4356(90)90159-m
[3]	Feinstein, A.R. and Cicchetti, D.V. (1990) High Agreement but Low Kappa: I. The Problems of Two Paradoxes. Journal of Clinical Epidemiology, 43, 543-549. https://doi.org/10.1016/0895-4356(90)90158-l
[4]	Wongpakaran, N., Wongpakaran, T., Wedding, D. and Gwet, K.L. (2013) A Comparison of Cohen’s Kappa and Gwet’s AC1 When Calculating Inter-Rater Reliability Coefficients: A Study Conducted with Personality Disorder Samples. BMC Medical Research Methodology, 13, Article No. 61. https://doi.org/10.1186/1471-2288-13-61
[5]	Ohyama, T. (2020) Statistical Inference of Gwet’s AC1 Coefficient for Multiple Raters and Binary Outcomes. Communications in Statistics—Theory and Methods, 50, 3564-3572. https://doi.org/10.1080/03610926.2019.1708397
[6]	Jimenez, A.M. and Zepeda, S.J. (2020) A Comparison of Gwet’s AC1 and Kappa When Calculating Inter-Rater Reliability Coefficients in a Teacher Evaluation Context. Journal of Education Human Resources, 38, 290-300. https://doi.org/10.3138/jehr-2019-0001
[7]	Gaspard, N., Hirsch, L.J., LaRoche, S.M., Hahn, C.D. and Westover, M.B. (2014) Interrater Agreement for Critical Care EEG Terminology. Epilepsia, 55, 1366-1373. https://doi.org/10.1111/epi.12653
[8]	Cohen, J. (1960) A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
[9]	Cohen, J. (1968) Weighted Kappa: Nominal Scale Agreement Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70, 213-220. https://doi.org/10.1037/h0026256
[10]	Gwet, K. (2002) Kappa Statistic Is Not Satisfactory for Assessing the Extent of Agreement between Raters. Statistical Methods for Inter-Rater Reliability Assessment, 1, 1-6.
[11]	Gwet, K. (2002) Inter-Rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity. Statistical Methods for Inter-Rater Reliability Assessment, 2, 1-9.
[12]	Gwet, K.L. (2008) Computing Inter-Rater Reliability and Its Variance in the Presence of High Agreement. British Journal of Mathematical and Statistical Psychology, 61, 29-48. https://doi.org/10.1348/000711006x126600
[13]	Byrt, T., Bishop, J. and Carlin, J.B. (1993) Bias, Prevalence and Kappa. Journal of Clinical Epidemiology, 46, 423-429. https://doi.org/10.1016/0895-4356(93)90018-v
[14]	Uebersax, J.S. (1987) Diversity of Decision-Making Models and the Measurement of Interrater Agreement. Psychological Bulletin, 101, 140-146. https://doi.org/10.1037//0033-2909.101.1.140
[15]	Viera, A.J. and Garrett, J.M. (2005) Understanding Interobserver Agreement: The Kappa Statistic. Family Medicine, 37, 360-363.
[16]	Strijbos, J., Martens, R.L., Prins, F.J. and Jochems, W.M.G. (2006) Content Analysis: What Are They Talking about? Computers & Education, 46, 29-48. https://doi.org/10.1016/j.compedu.2005.04.002

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133