The Open UniversitySkip to content
 

Pure High-order Word Dependence Mining via Information Geometry

Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei (2011). Pure High-order Word Dependence Mining via Information Geometry. In: The 3rd International Conference on the Theory of Information Retrieval (ICTIR2011), 12-14 September 2011, Bertinoro, Italy.

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (242Kb) | Preview
DOI (Digital Object Identifier) Link: http://doi.org/10.1007/978-3-642-23318-0_8
Google Scholar: Look up in Google Scholar

Abstract

The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.

Item Type: Conference Item
Copyright Holders: 2011 Springer
Extra Information: Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3.
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
Item ID: 34683
Depositing User: Dawei Song
Date Deposited: 16 Oct 2012 11:02
Last Modified: 25 Feb 2016 06:28
URI: http://oro.open.ac.uk/id/eprint/34683
Share this page:

Altmetrics

Scopus Citations

► Automated document suggestions from open access sources

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk