The Open UniversitySkip to content
 

A fact-aligned corpus of numerical expressions

Williams, Sandra and Power, Richard (2010). A fact-aligned corpus of numerical expressions. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC 2010), 19-21 May 2010, Malta.

Full text available as:
[img]
Preview
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (348Kb)
URL: http://www.lrec-conf.org/proceedings/lrec2010/inde...
Google Scholar: Look up in Google Scholar

Abstract

We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase.

Item Type: Conference Item
Copyright Holders: 2010 The Authors
Keywords: corpus (creation, annotation, etc.); natural language generation; multiword expressions; collocations
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 21500
Depositing User: Sandra Williams
Date Deposited: 03 Jun 2010 10:32
Last Modified: 10 Dec 2012 13:21
URI: http://oro.open.ac.uk/id/eprint/21500
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk