Copy the page URI to the clipboard
Hou, Yuexian; Zhao, Ziaozhao; Song, Dawei and Li, Weijie
(2013).
DOI: https://doi.org/10.1145/2493175.2493177
Abstract
The classical bag-of-word models for Information Retrieval (IR) fail to capture contextual associations between words. In this paper, we propose to investigate the “pure high-order dependence” among a number of words forming an un-separable semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns will lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Dependence (CPD) are presented. The decision on UPD and CPD, however, is a NP-hard in general.
We hence prove the sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification, and text retrieval without and with query expansion.
Viewing alternatives
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from Dimensions-
Request a copy from the authorVersion of Record (PDF)
This file is not available for public download
Item Actions
Export
About
- Item ORO ID
- 36460
- Item Type
- Journal Item
- ISSN
- 1558-2868
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Copyright Holders
- © 2013 ACM
- Related URLs
-
- http://tois.acm.org/(Other)
- Depositing User
- Dawei Song