The Open UniversitySkip to content
 

Pure High-order Word Dependence Mining via Information Geometry

Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei (2011). Pure High-order Word Dependence Mining via Information Geometry. In: The 3rd International Conference on the Theory of Information Retrieval (ICTIR2011), 12-14 September 2011, Bertinoro, Italy.

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (242Kb) | Preview
DOI (Digital Object Identifier) Link: http://dx.doi.org/10.1007/978-3-642-23318-0_8
Google Scholar: Look up in Google Scholar

Abstract

The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.

Item Type: Conference Item
Copyright Holders: 2011 Springer
Extra Information: Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3.
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Item ID: 34683
Depositing User: Dawei Song
Date Deposited: 16 Oct 2012 11:02
Last Modified: 23 Oct 2012 17:15
URI: http://oro.open.ac.uk/id/eprint/34683
Share this page:

Altmetrics

Scopus Citations

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk