The Open UniversitySkip to content
 

A fact-aligned corpus of numerical expressions

Williams, Sandra and Power, Richard (2010). A fact-aligned corpus of numerical expressions. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), 19-21 May 2010, Malta.

Full text available as:
[img]
Preview
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (348Kb)
URL: http://www.lrec-conf.org/proceedings/lrec2010/inde...
Google Scholar: Look up in Google Scholar

Abstract

We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase.

Item Type: Conference Item
Copyright Holders: 2010 The Authors
Keywords: corpus (creation, annotation, etc.); natural language generation; multiword expressions; collocations
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 21500
Depositing User: Sandra Williams
Date Deposited: 03 Jun 2010 10:32
Last Modified: 29 Feb 2016 17:30
URI: http://oro.open.ac.uk/id/eprint/21500
Share this page:

► Automated document suggestions from open access sources

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk