The Open UniversitySkip to content

A Bayesian mixture model for term re-occurrence and burstiness

Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne (2005). A Bayesian mixture model for term re-occurrence and burstiness. In: Ninth Conference on Computational Language Learning (CoNLL), 29-30 June 2005, Ann Arbor, Michigan, USA.

Full text available as:
PDF (Not Set) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (115Kb)
Google Scholar: Look up in Google Scholar


This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.

Item Type: Conference Item
Keywords: term distribution modelling; term burstiness; natural language processing; Bayesian modelling
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology
Mathematics, Computing and Technology > Mathematics and Statistics
Other Departments > Vice-Chancellor's Office
Other Departments
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 5003
Depositing User: Anne De Roeck
Date Deposited: 18 Jul 2006
Last Modified: 24 Feb 2016 04:59
Share this page:

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340