The Open UniversitySkip to content

Crowdsourcing Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize Crowd Size

Siddharthan, Advaith; Lambin, Christopher; Robinson, Anne-Marie; Sharma, Nirwan; Comont, Richard; O’Mahony, Elaine; Mellish, Chris and Van Der Wal, René (2016). Crowdsourcing Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize Crowd Size. ACM Transactions on Intelligent Systems and Technology, 7(4), article no. 45.

Full text available as:
[img] PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (790kB)
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


We present an incremental Bayesian model that resolves key issues of crowd size and data quality for consensus labeling. We evaluate our method using data collected from a real-world citizen science program, BeeWatch, which invites members of the public in the United Kingdom to classify (label) photographs of bumblebees as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (1) the large number of potential species makes classification difficult, and (2) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around three to five users (i.e., through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BeeWatch can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally, our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.

Item Type: Journal Item
Copyright Holders: 2016 The Authors
ISSN: 2157-6912
Project Funding Details:
Funded Project NameProject IDFunding Body
Digital Economy program to the University of Aberdeen’s dot.rural Digital Economy HubEP/G066051/1RCUK
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 51046
Depositing User: Advaith Siddharthan
Date Deposited: 21 Sep 2017 08:02
Last Modified: 24 May 2019 14:47
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU