Collecting Reliable Human Judgements on Machine-Generated Language: The Case of the QG-STEC Data

Godwin, Keith and Piwek, Paul (2016). Collecting Reliable Human Judgements on Machine-Generated Language: The Case of the QG-STEC Data. In: Proceedings of the 9th International Natural Language Generation Conference (Isard, Amy; Rieser, Verena and Gkatzia, Dimitra eds.), Association for Computational Linguistics, Edinburgh, pp. 212–216.

URL: http://www.macs.hw.ac.uk/InteractionLab/INLG2016/#

Abstract

Question generation (QG) is the problem of automatically generating questions from inputs such as declarative sentences. The Shared Evaluation Task Challenge (QG-STEC) Task B that took place in 2010 evaluated several state-of-the-art QG systems. However, analysis of the evaluation results was affected by low inter-rater reliability. We adapted Nonaka & Takeuchi’s knowledge creation cycle to the task of improving the evaluation annotation guidelines with a preliminary test showing clearly improved inter-rater reliability.

Viewing alternatives

Download history

Item Actions

Export

About

Recommendations