An NLG evaluation competition? Eight reasons to be Cautious

Scott, Donia and Moore, Johanna (2006). An NLG evaluation competition? Eight reasons to be Cautious. Technical Report 2006/09; Department of Computing, The Open University.



There is a move afoot within a section of the NLG community to push for a competitive comparative evaluation of generation systems, equivalent to similar initiatives within the message understanding, information retrieval, summarisation and word sense disambiguation communities ' viz. MUC, TREC, DUC, Senseval, Communicator, etc. (Reiter and Belz, 2006). While we agree that evaluation is clearly a difficult issue for NLG, and efforts to develop relevant evaluation techniques would obviously be very helpful, it is our view that an evaluation competition of the type proposed may not be sensible for NLG and could be a misguided effort that would damage rather than help the field.

Viewing alternatives

Download history


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions