Copy the page URI to the clipboard
Tobaili, Taha; Fernandez, Miriam; Alani, Harith; Sharafeddine, Sanaa; Hajj, Hazem and Glavas, Goran
(2019).
Abstract
Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3K positive and 13.3K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi.
Viewing alternatives
Download history
Item Actions
Export
About
- Item ORO ID
- 66829
- Item Type
- Conference or Workshop Item
- ISBN
- 954-452-056-2, 978-954-452-056-4
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Depositing User
- Taha Tobaili