Towards a universal bibliography – the RefBank approach

Sautter, Guido; King, David and Morse, David (2012). Towards a universal bibliography – the RefBank approach. In: TDWG (Biodiversity Information Standards) 2012, 22-26 Oct 2012, Beijing, PRC.

URL: http://www.tdwg.org/fileadmin/2012conference/slide...

Abstract

There remains no successful compilation of a universal bibliography for systematic biology. Commercial services like Mendeley or Zotero are achieving some traction. However, they are monolithic systems under the control of single entities, their biodiversity data scattered without easy data aggregation or interchange – why should competitors offer this functionality?

The ViBRANT (http://vbrant.eu/) project aims to compile a Bibliography of Life, building on the project’s available infrastructure, gathering data from across biodiversity sciences. Evaluating existing platforms, both commercial and scientific, we found none met the needs for a sustainable, universal bibliography, for the following reasons: (1) Too narrowly focused to achieve critical mass in data volume or in user community. (2) Monolithic systems with single points of failure and low perceived sustainability. (3) Too focused on data analysis and research, at the expense of building a reliable base system, hence appearing to be prototypes rather than stable platforms, reducing perceived reliability and sustainability. (4) Integration of data curation with data input, which makes contributing references tedious, thereby alienating potential users.

Thus, we built RefBank, following a radically different approach, applying proven principles from other forms of data management. (1) RefBank is an open, coordinator-free network of independent nodes that replicate the data among themselves, eliminating any single point of failure, achieving reliability and sustainability through redundancy. (2) No single entity governs the data; everyone can set up a node to link into the network; the web application can be downloaded from most existing nodes. Replication is pull-based, so no node can actively push erroneous data into the network. (3) Contributing is easy: everyone can upload their bibliographies, be they in BibTeX, EndNote, plain text or many other common formats, without any curation, and without prior registration; ReCAPTCHA protects the upload forms. (4) RefBank uses graph theory to embrace near duplicate references exploiting their inherent redundancy to enable automated reconciliation and curation through data mining techniques. (5) RefBank’s web interface supports manual curation, though manual curation is not required for the system to work; users can correct errors later as they find them when using the reference collection. (6) RefBank provides multiple data export formats, e.g. BibTeX and RIS, and can output references in a variety of common styles, e.g. Chicago or Harvard.

Viewing alternatives

Download history

Item Actions

Export

About