The Open UniversitySkip to content
 

Linked knowledge sources for topic classification of microposts: a semantic graph-based approach

Varga, Andrea; Cano Basave, Amparo Elizabeth; Rowe, Matthew; Ciravegna, Fabio and He, Yulan (2014). Linked knowledge sources for topic classification of microposts: a semantic graph-based approach. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 26 pp. 36–57.

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) | Preview
DOI (Digital Object Identifier) Link: https://doi.org/10.1016/j.websem.2014.04.001
Google Scholar: Look up in Google Scholar

Abstract

Short text messages, a.k.a microposts (e.g., tweets), have proven to be an effective channel for revealing information about trends and events, ranging from those related to disaster (e.g., Hurricane Sandy) to those related to violence (e.g., Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond.

In this work we study the problem of topic classification (TC) of microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information.

In order to provide contextual information to microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of microposts with features extracted only from the microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of microposts.

Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and microposts at a conceptual level, considering the enriched representation of these documents.

Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures.

Item Type: Journal Item
Copyright Holders: 2014 Elsevier B.V.
ISSN: 1570-8268
Keywords: linked knowledge sources; semantic concept graphs; topic classification
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Item ID: 41414
Depositing User: Amparo Cano Basave
Date Deposited: 26 Nov 2014 13:14
Last Modified: 08 Dec 2018 06:29
URI: http://oro.open.ac.uk/id/eprint/41414
Share this page:

Metrics

Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU