Copy the page URI to the clipboard
Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne
(2005).
DOI: https://doi.org/10.3115/1706543.1706552
URL: http://acl.ldc.upenn.edu/W/W05/W05-0607.pdf
Abstract
This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.
Viewing alternatives
Download history
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from DimensionsItem Actions
Export
About
- Item ORO ID
- 5003
- Item Type
- Conference or Workshop Item
- Keywords
- term distribution modelling; term burstiness; natural language processing; Bayesian modelling
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Mathematics and Statistics - Research Group
- Centre for Research in Computing (CRC)
- Depositing User
- Anne De Roeck