A corpus analysis of discourse relations for Natural Language Generation

Williams, Sandra and Reiter, Ehud (2003). A corpus analysis of discourse relations for Natural Language Generation. In: Proceedings of the Corpus Linguistics 2003 conference, 28-31 Mar 2003, Lancaster University, UK, pp. 899–908.

URL: http://ucrel.lancs.ac.uk/cl2003/#proceedings


We are developing a Natural Language Generation (NLG) system that generates texts tailored for the reading ability of individual readers. As part of building the system, GIRL (Generator for Individual Reading Levels), we carried out an analysis of the RST Discourse Treebank Corpus to find out how human writers linguistically realise discourse relations. The goal of the analysis was (a) to create a model of the choices that need to be made when realising discourse relations, and (b) to understand how these choices were typically made for “normal” readers, for a variety of discourse relations. We present our results for discourse relations: concession, condition, elaboration additional, evaluation, example, reason and restatement. We discuss the results and how they were used in GIRL.

