Sarkar, Avik; Garthwaite, Paul and De Roeck, Anne
(2005).
|
PDF (Not Set)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (115Kb) |
| URL: | http://acl.ldc.upenn.edu/W/W05/W05-0607.pdf |
|---|---|
| Google Scholar: | Look up in Google Scholar |
Abstract
This paper proposes a model for term reoccurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using
a mixture of exponential distributions. Parameter
estimation is based on a Bayesian framework that allows us to fit a flexible model. The model provides measures of a term’s re-occurrence rate and withindocument burstiness. The model works for all kinds of terms, be it rare content
word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.
| Item Type: | Conference Item |
|---|---|
| Keywords: | term distribution modelling; term burstiness; natural language processing; Bayesian modelling |
| Academic Unit/Department: | Mathematics, Computing and Technology > Computing Mathematics, Computing and Technology > Mathematics and Statistics Mathematics, Computing and Technology |
| Interdisciplinary Research Centre: | Centre for Research in Computing (CRC) |
| Item ID: | 5003 |
| Depositing User: | Anne De Roeck |
| Date Deposited: | 18 Jul 2006 |
| Last Modified: | 04 Dec 2010 04:46 |
| URI: | http://oro.open.ac.uk/id/eprint/5003 |
Actions (login may be required)
| View Item | |
| Public: Report issue / request change |




