The Open UniversitySkip to content
 

Rethinking the Agreement in Human Evaluation Tasks

Amidei, Jacopo; Piwek, Paul and Willis, Alistair (2018). Rethinking the Agreement in Human Evaluation Tasks. In: Proceedings of the 27th International Conference on Computational Linguistics, 20-26 Aug 2018, Santa Fe, New Mexico, pp. 3318–3329.

Full text available as:
[img]
Preview
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (170kB) | Preview
URL: http://aclweb.org/anthology/C18-1281
Google Scholar: Look up in Google Scholar

Abstract

Human evaluations are broadly thought to be more valuable the higher the inter-annotator agreement. In this paper we examine this idea. We will describe our experiments and analysis within the area of Automatic Question Generation. Our experiments show how annotators diverge in language annotation tasks due to a range of ineliminable factors. For this reason, we believe that annotation schemes for natural language generation tasks that are aimed at evaluating language quality need to be treated with great care. In particular, an unchecked focus on reduction of disagreement among annotators runs the danger of creating generation goals that reward output that is more distant from, rather than closer to, natural human-like language. We conclude the paper by suggesting a new approach to the use of the agreement metrics in natural language generation evaluation tasks.

Item Type: Conference or Workshop Item
Keywords: NLG; Human Evaluation; Inter-Annotator Agreement; Automatic Question Generation
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 56443
Depositing User: Jacopo Amidei
Date Deposited: 28 Sep 2018 08:55
Last Modified: 24 May 2019 12:12
URI: http://oro.open.ac.uk/id/eprint/56443
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU