Hou, Yuexian; He, Liang; Zhao, Xiaozhao and Song, Dawei
(2011).
Pure High-order Word Dependence Mining via Information Geometry.
In: The 3rd International Conference on the Theory of Information Retrieval (ICTIR2011), 12-14 September 2011, Bertinoro, Italy.
Full text available as:
Abstract
The classical bag-of-word models fail to capture contextual associations between words. We propose to investigate the “high-order pure dependence” among a number of words forming a semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependence. We believe that identifying these high-order pure dependence patterns will lead to a better representation of documents. We first present two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Depen- dence (CPD). The decision on UPD or CPD, however, is a NP-hard problem. We hence prove a series of sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with high-order pure dependence, which can then be used to extend the original unigram document models. Our methods are evaluated in the context of query ex- pansion. Compared with the original unigram model and its extensions with term associations derived from constant n-grams and Apriori association rule mining, our IG-based methods have proved mathematically more rigorous and empirically more effective.
| Item Type: |
Conference Item
|
| Copyright Holders: |
2011 Springer |
| Extra Information: |
Received best paper award.
Published in: Proceedings ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory, pages 64-76, Springer, ISBN: 978-3-642-23317-3. |
| Academic Unit/Department: |
Mathematics, Computing and Technology > Computing |
| Item ID: |
34683 |
| Depositing User: |
Dawei Song
|
| Date Deposited: |
16 Oct 2012 11:02 |
| Last Modified: |
23 Oct 2012 17:15 |
| URI: |
http://oro.open.ac.uk/id/eprint/34683 |
Actions (login may be required)