Performance measures for multilabel evaluation

Nowak, Stefanie; Lukashevich, Hanna; Dunker, Peter and Rüger, Stefan (2010). Performance measures for multilabel evaluation. In: Proceedings of the international conference on Multimedia information retrieval - MIR '10, p. 35.



With the steadily increasing amount of multimedia documents on the web and at home, the need for reliable semantic indexing methods that assign multiple keywords to a document grows. The performance of existing approaches is often measured with standard evaluation measures of the information retrieval community. In a case study on image annotation, we show the behaviour of 13 different evaluation measures and point out their strengths and weaknesses. For the analysis, data from 19 research groups that participated in the ImageCLEF Photo Annotation Task are utilized together with several configurations based on random numbers. A recently proposed ontology-based measure was investigated that incorporates structure information, relationships from the ontology and the agreement between annotators for a concept and compared to a hierarchical variant. The results for the hierarchical measure are not competitive. The ontology-based results assign good scores to the systems that got also good ranks in the other measures like the example-based F-measure. For concept-based evaluation, stable results could be obtained for MAP concerning random numbers and the number of annotated labels. The AUC measure shows good evaluation characteristics in case all annotations contain confidence values.

Viewing alternatives


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions