Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei
PDF (Accepted Manuscript)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (242Kb) | Preview
|DOI (Digital Object Identifier) Link:||http://doi.org/10.1007/978-3-642-23318-0_8|
|Google Scholar:||Look up in Google Scholar|
The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.
|Item Type:||Conference Item|
|Copyright Holders:||2011 Springer|
|Extra Information:||Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3.
|Academic Unit/Department:||Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
|Depositing User:||Dawei Song|
|Date Deposited:||16 Oct 2012 11:02|
|Last Modified:||25 Feb 2016 06:28|
|Share this page:|