The Open UniversitySkip to content
 

A Bayesian mixture model for term re-occurrence and burstiness

Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne (2005). A Bayesian mixture model for term re-occurrence and burstiness. In: Ninth Conference on Computational Language Learning (CoNLL), 29-30 June 2005, Ann Arbor, Michigan, USA.

Full text available as:
[img]
Preview
PDF (Not Set) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (115Kb)
URL: http://acl.ldc.upenn.edu/W/W05/W05-0607.pdf
Google Scholar: Look up in Google Scholar

Abstract

This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.

Item Type: Conference Item
Keywords: term distribution modelling; term burstiness; natural language processing; Bayesian modelling
Academic Unit/Department: Mathematics, Computing and Technology > Computing & Communications
Mathematics, Computing and Technology > Mathematics and Statistics
Mathematics, Computing and Technology
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 5003
Depositing User: Anne De Roeck
Date Deposited: 18 Jul 2006
Last Modified: 04 Dec 2010 04:46
URI: http://oro.open.ac.uk/id/eprint/5003
Share this page:

Actions (login may be required)

View Item
Report issue / request change

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk