Copy the page URI to the clipboard
Saif, Hassan; Fernández, Miriam and Alani, Harith
(2014).
URL: http://ceur-ws.org/Vol-1272/paper_55.pdf
Abstract
In this paper we propose a semantic approach to automatically identify and remove stopwords from Twitter data. Unlike most existing approaches, which rely on outdated and context-insensitive stopword lists, our proposed approach considers the contextual semantics and sentiment of words in order to measure their discrimination power. Evaluation results on 6 Twitter datasets show that, removing our semantically identified stopwords from tweets, increases the binary sentiment classification performance over the classic pre-complied stopword list by 0.42% and 0.94% in accuracy and F-measure respectively. Also, our approach reduces the sentiment classifier's feature space by 48.34% and the dataset sparsity by 1.17%, on average, compared to the classic method.