The Open UniversitySkip to content

Constructing the CODA corpus: A parallel corpus ofmonologues and expository dialogues

Stoyanchev, Svetlana and Piwek, Paul (2010). Constructing the CODA corpus: A parallel corpus ofmonologues and expository dialogues. In: The seventh international conference on Language Resources and Evaluation (LREC) (Forthcoming), 18-21 May 2010, Malta.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (251kB)
Google Scholar: Look up in Google Scholar


We describe the construction of the CODA corpus, a parallel corpus of monologues and expository dialogues. The dialogue part of the corpus consists of expository, i.e., information-delivering rather than dramatic, dialogues written by several acclaimed authors. The monologue part of the corpus is a paraphrase in monologue form of these dialogues by a human annotator. The corpus was constructed as a resource for extracting rules for automated generation of dialogue from monologue. Using authored dialogues allows us to analyse the techniques used by accomplished writers for presenting information in the form of dialogue. The dialogues are annotated with dialogue acts and the monologues with rhetorical structure. We developed annotation and translation guidelines together with a custom-developed tool for carrying out translation, alignment and annotation.

Item Type: Conference or Workshop Item
Copyright Holders: 2010 The Authors
Extra Information: The dataset described in this paper is available here.
Keywords: dialogue; generation; language
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Related URLs:
Item ID: 20919
Depositing User: Svetlana Stoyanchev
Date Deposited: 20 Apr 2010 13:43
Last Modified: 07 Dec 2018 13:09
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU