Copy the page URI to the clipboard
Stoyanchev, Svetlana and Piwek, Paul
(2010).
URL: http://www.lrec-conf.org/lrec2010/
Abstract
We describe the construction of the CODA corpus, a parallel corpus of monologues and expository dialogues. The dialogue part of the corpus consists of expository, i.e., information-delivering rather than dramatic, dialogues written by several acclaimed authors. The monologue part of the corpus is a paraphrase in monologue form of these dialogues by a human annotator. The corpus was constructed as a resource for extracting rules for automated generation of dialogue from monologue. Using authored dialogues allows us to analyse the techniques used by accomplished writers for presenting information in the form of dialogue. The dialogues are annotated with dialogue acts and the monologues with rhetorical structure. We developed annotation and translation guidelines together with a custom-developed tool for carrying out translation, alignment and annotation.
Viewing alternatives
Download history
Item Actions
Export
About
- Item ORO ID
- 20919
- Item Type
- Conference or Workshop Item
- Extra Information
- The dataset described in this paper is available here.
- Keywords
- dialogue; generation; language
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Copyright Holders
- © 2010 The Authors
- Related URLs
- Depositing User
- Svetlana Stoyanchev