Copy the page URI to the clipboard
Scott, Donia and Moore, Johanna (2006). An NLG evaluation competition? Eight reasons to be Cautious. Technical Report 2006/09; Department of Computing, The Open University.
DOI: https://doi.org/10.21954/ou.ro.00016050
Abstract
There is a move afoot within a section of the NLG community to push for a competitive comparative evaluation of generation systems, equivalent to similar initiatives within the message understanding, information retrieval, summarisation and word sense disambiguation communities ' viz. MUC, TREC, DUC, Senseval, Communicator, etc. (Reiter and Belz, 2006). While we agree that evaluation is clearly a difficult issue for NLG, and efforts to develop relevant evaluation techniques would obviously be very helpful, it is our view that an evaluation competition of the type proposed may not be sensible for NLG and could be a misguided effort that would damage rather than help the field.