Copy the page URI to the clipboard
Dickinson, Thomas Kier
(2019).
DOI: https://doi.org/10.21954/ou.ro.00010aa9
Abstract
Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves.
But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online.
The advantages of this type of technology also have direct merits within marketing, allowing companies to target customers with better products. We also suggest that the techniques and methodologies built throughout this thesis also have opportunities to support research within other areas such as cyber bullying, and radicalisation detection.
The aim of this thesis is to build upon the under researched area of life event detection, specifically targeting Twitter, and Instagram. Our goal is to develop classifiers that identify a list of life events inspired by cognitive psychology, where we target a total of seven within this thesis.
To achieve this we look to answer three research questions covered in each of our empirical chapters. In our first empirical chapter, we ask; What features would improve the classification of important life events. To answer this, we look at first extracting a new dataset from Twitter targeting the following events: Getting Married, Having Children, Starting School, Falling in Love, and Death of a Parent. We look at three new feature sets: interactions, content, and semantic features, and compare against a current state of the art technique.
In our second empirical chapter, we draw inspiration from cheminformatics, and frequent sub-graph mining to ask; Could the inclusion of semantic and syntactic patterns improve performance in our life event classifier. Here we look at expanding our tweets into semantic networks, as well as consider two forms of syntactic relationships between tokens. We then mine for frequent sub-graphs amongst our tweet graphs, and use these as features in our classifier. Our results produce F1 scores of between 0.65 and 0.77, providing an improvement between 0.01 and 0.04 against the current state of the art.
In our final empirical chapter, we look to answer our third research question; How can we detect important life events from other social media sites, such as Instagram?. We ask this question, as we believe Instagram to be a preferred environment to share personal life events. In this chapter, we extract a new dataset, targeting the following events: Getting Married, Having Children, Starting School, Graduation, and Buying a House. Our results find that our methodology provides F1 scores between 0.78, and 0.82, an improvement in F1 score between 0.01 and 0.04 against the current state of the art.