The Open UniversitySkip to content

Curation tools for taxonomic databases

Morse, David; De Roeck, Anne; Willis, Alistair and Yang, Hui (2013). Curation tools for taxonomic databases. In: BioCuration 2013, Sunday 7 April to Wednesday 10 April 2013, Churchill College, Cambridge, UK.

Full text available as:
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (816kB) | Preview
Google Scholar: Look up in Google Scholar


Biological taxonomy is the classification of living and fossil organisms. Taxonomists have identified and named some 1.8 million species of animals, plants, and microorganisms, a fraction of Earth's estimated 5‐30 million species. Part of the effort of taxonomy lies in developing and curating taxonomic databases, which support access to the taxonomy literature, and provide basic knowledge needed for management and conservation of biodiversity.

A major difficulty facing this task is incorporating knowledge that is currently contained only in the historical literature. Extracting this knowledge is a difficult and labour‐intensive process, as scanning errors and other variations in nomenclature mean that particular names must be manually verified as part of the process. For example, Actinobacillus actionomy, Actinobacillus actionomyce, and Actinobacillus actionomycetam could all be variants of the same name. ComTax is an ongoing project to develop a community‐driven curation process among taxonomists, by providing tools to help them identify and validate taxonomic names from the scanned historical literature. The system operates on scanned documents following optical character recognition (OCR). The key stages are:

1. Identify possible taxonomic names from scanned text. Names might be new either because they do not appear in existing databases, or because they have been incorrectly identified by OCR.

2. Present the proposed name to a domain expert for validation or correction.

3. Present validated taxonomic names for curation. Organizations like Global Biodiversity Information Facility (GBIF) manage the curation of taxonomic databases. This poster describes the technical challenges facing the ComTax project, and will discuss our work with the Natural History Museum to integrate the curation process into taxonomists' workflow. We also demonstrate the relevance of this work within the wider context of biological taxonomy curation.

Item Type: Conference or Workshop Item
Copyright Holders: 2013 Open University
Project Funding Details:
Funded Project NameProject IDFunding Body
Not SetNot SetJISC
Keywords: curation; data; biodiversity; taxonomy
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Related URLs:
Item ID: 37536
Depositing User: David King
Date Deposited: 02 May 2013 13:27
Last Modified: 05 Oct 2016 00:16
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU