Mining pure high-order word associations via information geometry for information retrieval

Hou, Yuexian; Zhao, Ziaozhao; Song, Dawei and Li, Weijie (2013). Mining pure high-order word associations via information geometry for information retrieval. ACM Transactions on Information Systems (TOIS), 31(3) 12:1-12:32.

DOI: https://doi.org/10.1145/2493175.2493177

Abstract

The classical bag-of-word models for Information Retrieval (IR) fail to capture contextual associations between words. In this paper, we propose to investigate the “pure high-order dependence” among a number of words forming an un-separable semantic entity, i.e., the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns will lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence: Unconditional Pure Dependence (UPD) and Conditional Pure Dependence (CPD) are presented. The decision on UPD and CPD, however, is a NP-hard in general.

We hence prove the sufficient criteria that entail UPD and CPD, within the well-principled Information Geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods to extract word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification, and text retrieval without and with query expansion.

Viewing alternatives

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About