Copy the page URI to the clipboard
Saif, Hassan; Fernández, Miriam and Alani, Harith
(2014).
URL: http://ceur-ws.org/Vol-1272/paper_55.pdf
Abstract
In this paper we propose a semantic approach to automatically identify and remove stopwords from Twitter data. Unlike most existing approaches, which rely on outdated and context-insensitive stopword lists, our proposed approach considers the contextual semantics and sentiment of words in order to measure their discrimination power. Evaluation results on 6 Twitter datasets show that, removing our semantically identified stopwords from tweets, increases the binary sentiment classification performance over the classic pre-complied stopword list by 0.42% and 0.94% in accuracy and F-measure respectively. Also, our approach reduces the sentiment classifier's feature space by 48.34% and the dataset sparsity by 1.17%, on average, compared to the classic method.
Viewing alternatives
Download history
Item Actions
Export
About
- Item ORO ID
- 41400
- Item Type
- Conference or Workshop Item
- ISSN
- 1613-0073
- Extra Information
-
ISWC-P&D 2014
ISWC 2014 Posters & Demonstrations Track
a track within the 13th International Semantic Web Conference (ISWC 2014)
Edited by Matthew Horridge, Marco Rospocher, Jacco van Ossenbruggen - Keywords
- sentiment analysis; contextual semantics; stopwords; Twitter
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
- Centre for Research in Computing (CRC)
- Copyright Holders
- © 2014 for the individual papers by the papers' authors.
- Related URLs
- Depositing User
- Hassan Saif