The Open UniversitySkip to content

Pure High-order Word Dependence Mining via Information Geometry

Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei (2011). Pure High-order Word Dependence Mining via Information Geometry. In: The 3rd International Conference on the Theory of Information Retrieval (ICTIR2011), 12-14 September 2011, Bertinoro, Italy.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (242Kb) | Preview
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.

Item Type: Conference Item
Copyright Holders: 2011 Springer
Extra Information: Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3.
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
Item ID: 34683
Depositing User: Dawei Song
Date Deposited: 16 Oct 2012 11:02
Last Modified: 25 Feb 2016 06:28
Share this page:


Scopus Citations

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340