Copy the page URI to the clipboard
Zhang, Peng; Hao, Linxue; Song, Dawei; Wang, Jun; Hou, Yuexian and Hu, Bin
(2014).
DOI: https://doi.org/10.1145/2661829.2661934
URL: http://cikm2014.fudan.edu.cn
Abstract
Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased “baseline”, which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked.
Viewing alternatives
Download history
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from DimensionsItem Actions
Export
About
- Item ORO ID
- 40779
- Item Type
- Conference or Workshop Item
- ISBN
- 1-4503-2598-X, 978-1-4503-2598-1
- Project Funding Details
-
Funded Project Name Project ID Funding Body Not Set 61402324 Natural Science Foundation of China Not Set 61272265 Natural Science Foundation of China Not Set 61105072 Natural Science Foundation of China 973 Program 2013CB329304 Chinese National Program on Key Basic Research Project 973 Program 2014CB744604 Chinese National Program on Key Basic Research Project - Keywords
- evaluation; effectiveness-stability tradeoff; biasvariance tradeoff; virtual target model
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Copyright Holders
- © 2014 ACM
- Depositing User
- Dawei Song