Copy the page URI to the clipboard
Willis, Alistair; King, David; Morse, David; Dil, Anton; Lyal, Chris and Roberts, David
(2010).
URL: http://www.lrec-conf.org/proceedings/lrec2010/pdf/...
Abstract
We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit.
Viewing alternatives
Download history
Item Actions
Export
About
- Item ORO ID
- 20856
- Item Type
- Conference or Workshop Item
- Project Funding Details
-
Funded Project Name Project ID Funding Body Not Set Not Set JISC - Keywords
- biodiversity; information extraction; xml;
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Copyright Holders
- © 2010 The Authors
- Depositing User
- Alistair Willis