The Open UniversitySkip to content

How reliable are annotations via crowdsourcing? a study about inter-annotator agreement for multi-label image annotation

Nowak, Stefanie and Rüger, Stefan (2010). How reliable are annotations via crowdsourcing? a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on Multimedia information retrieval - MIR '10, p. 557.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (762kB)
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


The creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how different sets of annotations influence the ranking of systems and if these annotations can be obtained with a crowdsourcing approach. This study is applied to annotations of images with multiple concepts. A subset of the images employed in the latest ImageCLEF Photo Annotation competition was manually annotated by expert annotators and non-experts with Mechanical Turk. The inter-annotator agreement is computed at an image-based and concept-based level using majority vote, accuracy and kappa statistics. Further, the Kendall τ and Kolmogorov-Smirnov correlation test is used to compare the ranking of systems regarding different ground-truths and different evaluation measures in a benchmark scenario. Results show that while the agreement between experts and non-experts varies depending on the measure used, its influence on the ranked lists of the systems is rather small. To sum up, the majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent. The resulting annotation set is of comparable quality to the annotations of experts.

Item Type: Conference or Workshop Item
Copyright Holders: 2010 ACM
Extra Information: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
Keywords: experimentation; human factors; measurement; performance; inter-annotator agreement; crowdsourcing
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 25874
Depositing User: Kay Dave
Date Deposited: 11 Jan 2011 12:40
Last Modified: 07 Dec 2018 11:20
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU