Factive / non-factive predicate recognition within Question Generation systems

Wyse, Brendan (2009). Factive / non-factive predicate recognition within Question Generation systems. Student dissertation for The Open University module M801 MSc in Software Development Research Dissertation.

Please note that this student dissertation is made available in the format that it was submitted for examination, thus the author has not been able to correct errors and/or departures from academic standards in areas such as referencing.

DOI: https://doi.org/10.21954/ou.ro.000160ab

Abstract

The research in this paper relates to Question Generation (QG) – an area of computational and linguistic study with the goal of enabling machines to ask questions using human language. QG requires processing a sentence to generate a question or questions relating to that sentence. This research focuses on the sub-problem of generating questions where the answer can be obtained from the input sentence. One issue with generating such questions is the instance where a proposition in a declarative content clause in a sentence is taken to be true, when it might not actually be. Two sentences are shown in Figure a.1 below with the same declarative content clause (underlined) but with different predicate verbs (bold). The certainty that the proposition in the declarative content clause is true, is different for each. Figure a.1 Predicate verbs A QG system without the ability to understand the difference between the sentences above might generate the question ‘How many people were at the conference?’ Whilst this is grammatically, a valid question, it cannot be definitively answered given (1) above. From (1) we are not absolutely certain how many people were at the conference because the speaker in the sentence is not absolutely certain. In a system designed to generate only questions that can be answered by the input sentence, this is a flaw. The verb ‘know’ is a factive verb. A factive verb “assigns the status of an established fact to its object” (Soanes and Stevenson, 2005a). The verb ‘think’ is a non-factive. A non-factive is a verb “that takes a clausal object which may or may not designate a true fact” (Soanes and Stevenson, 2005b). This research asks the question; what is the impact of enabling a QG system to recognise sentences containing these factive or nonfactive verbs? Impact was regarded as both the overall impact which such a system might have on QG as a whole and the quality improvements which might be obtainable. A QG system was written as part of this research and a sub-task was implemented in this system by writing a software algorithm to perform factive / non-factive recognition. This was done by using a list of factive and non-factive verbs produced by Hooper (1974) which was expanded using a thesaurus. The expanded list allowed me to determine frequency of occurrence for factive/non-factive indicators and thus analyse overall impact. The same list was then used within the QG system to analyse the improvement of question quality. The analysis of factive / non-factive recognition was carried out using the Open University’s online educational resource, OpenLearn. OpenLearn was chosen as it is educational material and is available in a well marked XML format which makes it easy to extract certain content. It was found that factive and non-factive verbs are common enough in educational discourse to justify further work on factivity recognition. The effect on precision when generating questions where the question must be answerable from the input sentence was quite good. It was found that whilst the module was successful in removing unwanted questions it did also remove some perfectly good questions. Previous research has concluded, however, that it is better to generate questions of higher precision and I agree.

Viewing alternatives

Download history

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About