Copy the page URI to the clipboard
Haley, Debra Trusso (2007). Using a New Inter-rater Reliability Statistic. Technical Report 2007/16; Department of Computing, The Open University.
DOI: https://doi.org/10.21954/ou.ro.00016062
Abstract
This paper discusses methods to evaluate Computer Assisted Assessment (CAA) systems, including some commonly used metrics as well as unconventional ones. I found that most of the methods to measure automated assessment reported in the literature were not useful for my purposes. After much research, I found a new metric, the Gwet AC1 inter-rater reliability (IRR) statistic (Gwet, 2001), that is a good solution for evaluating CAAs. Section 1.6 discusses AC1, but first I describe other possible metrics to motivate why I think that AC1 is the best available for evaluating an automated assessment system.