Using a New Inter-rater Reliability Statistic

Haley, Debra Trusso (2007). Using a New Inter-rater Reliability Statistic. Technical Report 2007/16; Department of Computing, The Open University.



This paper discusses methods to evaluate Computer Assisted Assessment (CAA) systems, including some commonly used metrics as well as unconventional ones. I found that most of the methods to measure automated assessment reported in the literature were not useful for my purposes. After much research, I found a new metric, the Gwet AC1 inter-rater reliability (IRR) statistic (Gwet, 2001), that is a good solution for evaluating CAAs. Section 1.6 discusses AC1, but first I describe other possible metrics to motivate why I think that AC1 is the best available for evaluating an automated assessment system.

Viewing alternatives

Download history


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions