The Open UniversitySkip to content
 

From XML to XML: The why and how of making the biodiversity literature accessible to researchers

Willis, Alistair; King, David; Morse, David; Dil, Anton; Lyal, Chris and Roberts, David (2010). From XML to XML: The why and how of making the biodiversity literature accessible to researchers. In: Language Resources and Evaluation Conference (LREC), 19-21 May 2010, Malta, pp. 1237–1244.

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (1557Kb)
URL: http://www.lrec-conf.org/proceedings/lrec2010/pdf/...
Google Scholar: Look up in Google Scholar

Abstract

We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit.

Item Type: Conference Item
Copyright Holders: 2010 The Authors
Project Funding Details:
Funded Project NameProject IDFunding Body
Not SetNot SetJISC
Keywords: biodiversity; information extraction; xml;
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 20856
Depositing User: Alistair Willis
Date Deposited: 15 Jun 2010 16:07
Last Modified: 24 Feb 2016 12:00
URI: http://oro.open.ac.uk/id/eprint/20856
Share this page:

► Automated document suggestions from open access sources

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk