A Bayesian mixture model for term re-occurrence and burstiness

Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne (2005). A Bayesian mixture model for term re-occurrence and burstiness. In: Ninth Conference on Computational Language Learning (CoNLL), 29-30 Jun 2005, Ann Arbor, Michigan, USA, pp. 48–55.

DOI: https://doi.org/10.3115/1706543.1706552


This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using a mixture of exponential distributions. Parameter estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and within document burstiness. The model works for all kinds of terms, be it rare content word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.

Viewing alternatives


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions