Copy the page URI to the clipboard
Foreman, D. (2008). Evaluating semantic and rhetorical features to determine author attitude in tabloid newspaper articles. Student dissertation for The Open University module M801 MSc in Software Development Research Dissertation.
DOI: https://doi.org/10.21954/ou.ro.00016078
Abstract
This dissertation investigates the potential of families of machine learning features to improve the accuracy of a semantic orientation classifier that assesses attitudes of tabloid journalists towards the subjects of their opinion piece articles. A category of language, `language of judgement', is defined by which a journalist expresses an opinion matching his overall opinion of an article's subject matter. When the existence of “language of judgement“ was investigated, high inter-annotator agreement on per-document author attitude was found (values of Fleisch and Cohen's kappa were both 0.845) along with moderate agreement on per-sentence classification of judgemental or non-judgemental language (Fleisch's kappa of 0.507 and Cohen's kappa of 0.499). Three families of feature sets were defined to detect this language. The first family, `Semantic features', motivated by consideration of theory of journalism, tags repetitions of nouns that are either located in particular sections of the article or occur multiple times in the article as potential language of judgement. The second and third families, `rhetorical features', draw on Mann and Thompson's Rhetorical Structure Theory. For the second family, rhetorical relations are tagged to indicate the presence of potential language of judgement. For the third family, rhetorical relations are considered to mark potential shifts into and out of language of judgement. Areas of articles between tags from the first family and tags from the second family are tagged with features from this third family, to indicate that the sentence is potentially within an area of language of judgement bounded by these rhetorical relations. The feature sets were not very productive in acquiring judgemental language, together or separately. Precision of 0.405 for combined features was low but exceeded the overall percentage of judgemental language (32.8 percent). Recall of 0.162 was very low. While experimentation with the testing corpus did not give strong evidence for value of the feature sets, cross-validation tests on the training corpus showed greater potential, achieving precision of 0.520 and recall of 0.200. Inspection of learning curves created with the training corpus for the combination of all features showed that learning of judgemental language was taking place. This was also true for the `rhetorical' second and third families when they were investigated separately but was not seen for the first family of features. Weaknesses in corpora construction methodology are considered potentially responsible for differences in results between corpora: suggested changes to remedy this, if more opinion piece articles can be collected, are described. When classifying per-document author attitude, using human-annotated language of judgement was seen to improve the accuracy of a semantic orientation classifier that used Turney's PMI-IR algorithm (in comparison to use of all language in a document). However classification using language selected by the machine learning method did not lead to a similar improvement. The low precision and recall for acquisition of language of judgement obtained on testing corpus data is considered a likely cause of this.