Copy the page URI to the clipboard
Hou, Yuexian; Zhao, Ziaozhao; Song, Dawei and Li, Weijie
(2013).
DOI: https://doi.org/10.1145/2493175.2493177
Abstract
The classical bag-of-word models for Information Retrieval (IR) fail to capture contextual associations between words. In this paper, we propose to investigate the “pure high-order dependence” among a number of words forming an un-separable semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns will lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Dependence (CPD) are presented. The decision on UPD and CPD, however, is a NP-hard in general.
We hence prove the sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification, and text retrieval without and with query expansion.
Viewing alternatives
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from Dimensions- Request a copy from the author This file is not available for public download