Copy the page URI to the clipboard
Montemurro, Marcelo A. and Zanette, Damián H.
(2002).
DOI: https://doi.org/10.1142/S0219525902000493
Abstract
Beyond the local constraints imposed by grammar, words concatenated in long sequences carrying a complex message show statistical regularities that may reflect their linguistic role in the message. In this paper, we perform a systematic statistical analysis of the use of words in literary English corpora. We show that there is a quantitative relation between the role of content words in literary English and the Shannon information entropy defined over an appropriate probability distribution. Without assuming any previous knowledge about the syntactic structure of language, we are able to cluster certain groups of words according to their specific role in the text.
Viewing alternatives
Metrics
Public Attention
Altmetrics from AltmetricNumber of Citations
Citations from DimensionsItem Actions
Export
About
- Item ORO ID
- 79228
- Item Type
- Journal Item
- Keywords
- entropy; natural language; Shannon information; Zipf's law
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Mathematics and Statistics
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Copyright Holders
- © 2002 World Scientific Publishing Co Pte Ltd
- Depositing User
- Marcelo Montemurro