The Open UniversitySkip to content

How reliable are annotations via crowdsourcing? a study about inter-annotator agreement for multi-label image annotation

Nowak, Stefanie and Rüger, Stefan (2010). How reliable are annotations via crowdsourcing? a study about inter-annotator agreement for multi-label image annotation. In: The 11th ACM International Conference on Multimedia Information Retrieval (MIR), 29-31 Mar 2010, Philadelphia, USA, p. 557.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (744Kb)
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


The creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how different sets of annotations influence the ranking of systems and if these annotations can be obtained with a crowdsourcing approach. This study is applied to annotations of images with multiple concepts. A subset of the images employed in the latest ImageCLEF Photo Annotation competition was manually annotated by expert annotators and non-experts with Mechanical Turk. The inter-annotator agreement is computed at an image-based and concept-based level using majority vote, accuracy and kappa statistics. Further, the Kendall τ and Kolmogorov-Smirnov correlation test is used to compare the ranking of systems regarding different ground-truths and different evaluation measures in a benchmark scenario. Results show that while the agreement between experts and non-experts varies depending on the measure used, its influence on the ranked lists of the systems is rather small. To sum up, the majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent. The resulting annotation set is of comparable quality to the annotations of experts.

Item Type: Conference Item
Copyright Holders: 2010 ACM
Extra Information: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
Keywords: experimentation; human factors; measurement; performance; inter-annotator agreement; crowdsourcing
Academic Unit/Department: Knowledge Media Institute
Item ID: 25874
Depositing User: Kay Dave
Date Deposited: 11 Jan 2011 12:40
Last Modified: 23 Oct 2012 21:26
Share this page:


Scopus Citations

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340