The Open UniversitySkip to content
 

Approximating true relevance distribution from a mixture model based on irrelevance data

Zhang, Peng; Hou, Yuexian and Song, Dawei (2009). Approximating true relevance distribution from a mixture model based on irrelevance data. In: 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2009), 19–23 July 2009, Boston, USA.

Full text available as:
Full text not publicly available
Due to copyright restrictions, this file is not available for public download
Click here to request a copy from the OU Author.
DOI (Digital Object Identifier) Link: http://dx.doi.org/10.1145/1571941.1571962
Google Scholar: Look up in Google Scholar

Abstract

Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often a mixture of relevant and irrelevant documents. As a result, the derived distribution is actually a mixture model, which has long been limiting the performance of PRF. This is particularly the case when we deal with difficult queries where the truly relevant documents in D are very sparse. In this situation, it is often easier to identify a small number of seed irrelevant documents, which can form a seed irrelevant distribution. Then, a fundamental and challenging problem arises: solely based on the mixed distribution and a seed irrelevance distribution, how to automatically generate an optimal approximation of the true relevance distribution? In this paper, we propose a novel distribution separation model (DSM) to tackle this problem. Theoretical justifications of the proposed algorithm are given. Evaluation results from our extensive simulated experiments on several large scale TREC data sets demonstrate the effectiveness of our method, which outperforms a well respected PRF Model, the Relevance Model (RM), as well as the use of RM on D with the seed negative documents directly removed.

Item Type: Conference Item
Copyright Holders: 2009 ACM
Extra Information: SIGIR '09
Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval
ACM New York, NY, USA ©2009
ISBN: 978-1-60558-483-6
doi>10.1145/1571941.1571962
pp.107-114
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Item ID: 33898
Depositing User: Dawei Song
Date Deposited: 21 Jun 2012 09:53
Last Modified: 26 Oct 2012 22:07
URI: http://oro.open.ac.uk/id/eprint/33898
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk