Using a New Inter-rater Reliability Statistic

Haley, Debra Trusso (2007). Using a New Inter-rater Reliability Statistic. Technical Report 2007/16; Department of Computing, The Open University.

DOI: https://doi.org/10.21954/ou.ro.00016062

Abstract

This paper discusses methods to evaluate Computer Assisted Assessment (CAA) systems, including some commonly used metrics as well as unconventional ones. I found that most of the methods to measure automated assessment reported in the literature were not useful for my purposes. After much research, I found a new metric, the Gwet AC1 inter-rater reliability (IRR) statistic (Gwet, 2001), that is a good solution for evaluating CAAs. Section 1.6 discusses AC1, but first I describe other possible metrics to motivate why I think that AC1 is the best available for evaluating an automated assessment system.

Viewing alternatives

Download history

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About