The Open UniversitySkip to content
 

Generalized Analysis of a Distribution Separation Method

Zhang, Peng; Yu, Qian; Hou, Yuexian; Song, Dawei; Li, Jingfei and Hu, Bin (2016). Generalized Analysis of a Distribution Separation Method. Entropy, 18(4), article no. 105.

Full text available as:
[img]
Preview
PDF (Version of Record) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (471kB) | Preview
DOI (Digital Object Identifier) Link: https://doi.org/10.3390/e18040105
Google Scholar: Look up in Google Scholar

Abstract

Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM’s theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback–Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen–Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM’s linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.

Item Type: Journal Item
Copyright Holders: 2016 The Authors
ISSN: 1099-4300
Keywords: information retrieval; distribution separation; KL-divergence; mixture model
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 47210
Depositing User: Dawei Song
Date Deposited: 06 Sep 2016 14:07
Last Modified: 07 Dec 2018 22:54
URI: http://oro.open.ac.uk/id/eprint/47210
Share this page:

Metrics

Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU