The Open UniversitySkip to content

One document, many users: what happens when you re-purpose a document?

King, David; Morse, David and Lyal, Chris (2013). One document, many users: what happens when you re-purpose a document? In: BioCuration 2013, 07-10 Apr 2013, Churchill College, Cambridge, UK.

Full text available as:
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (823kB) | Preview
Google Scholar: Look up in Google Scholar


To assess global challenges surrounding issues such as climate change and invasive species requires a baseline of historical data. We are fortunate in biodiversity that such data exists in a rich body of literature. One such source of historical data is the Biologia Centrali-Americana (BCA), which documents the plant and animal life in Central America one hundred years’ ago, and which can be compared to contemporary species distributions. This valuable resource has recently been re-keyed and manually marked up by the INOTAXA project ( The 56-volume work is now being curated before wider release.

The manual annotation of the BCA is both time consuming in its initial phases and demands expert review to curate the results. This manual approach to mining historic texts is not viable for large-scale works such as the BCA. Attempts to automate the process face the problem of not having suitable corpora against which to develop and then test automated solutions such as text mining. One project, ViBRANT (, sought to use the scale of the re-keyed data being produced by INOTAXA to develop a solution to this problem. However, this apparently straightforward task has thrown up many issues because different audiences have different requirements of the mark up.

This presentation describes the process by which the BCA is being reworked from digitisation through to a curated document corpus. The intended users are biodiversity scientists who can use the corpus for taxonomic and biodiversity research, and computer scientists who can use it to develop new text mining and mark up tools. The presentation covers the different requirements of scientists in the two domains, how this affects the mark up required of the documents, and how to re-purpose the annotations to meet the needs of different and sometimes disparate scientific audiences.

Item Type: Conference or Workshop Item
Copyright Holders: 2013 ViBRANT
Keywords: curation; data; xml; biodiversity; taxonomy
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Related URLs:
Item ID: 37524
Depositing User: David King
Date Deposited: 02 May 2013 08:10
Last Modified: 11 Jun 2020 17:48
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU