Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei
Pure High-order Word Dependence Mining via Information Geometry.
In: The 3rd International Conference on the Theory of Information Retrieval (ICTIR2011), 12-14 September 2011, Bertinoro, Italy.
Full text available as:
The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.
||Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3.
||Mathematics, Computing and Technology > Computing & Communications
||16 Oct 2012 11:02
||23 Oct 2012 17:15
Actions (login may be required)