The Open UniversitySkip to content
 

A Bayesian mixture model for term re-occurrence and burstiness

Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne (2005). A Bayesian mixture model for term re-occurrence and burstiness. In: Ninth Conference on Computational Language Learning (CoNLL), 29-30 June 2005, Ann Arbor, Michigan, USA.

Full text available as:
[img]
Preview
PDF (Not Set) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (115Kb)
URL: http://acl.ldc.upenn.edu/W/W05/W05-0607.pdf
Google Scholar: Look up in Google Scholar

Abstract

This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.

Item Type: Conference Item
Keywords: term distribution modelling; term burstiness; natural language processing; Bayesian modelling
Academic Unit/Department: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Mathematics and Statistics
Interdisciplinary Research Centre: Centre for Research in Computing (CRC)
Item ID: 5003
Depositing User: Anne De Roeck
Date Deposited: 18 Jul 2006
Last Modified: 04 Oct 2016 15:17
URI: http://oro.open.ac.uk/id/eprint/5003
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

▼ Automated document suggestions from open access sources

Actions (login may be required)

Policies | Disclaimer

© The Open University   + 44 (0)870 333 4340   general-enquiries@open.ac.uk