The Open UniversitySkip to content

Incidental or influential? – A decade of using text-mining for citation function classification.

Pride, David and Knoth, Petr (2017). Incidental or influential? – A decade of using text-mining for citation function classification. In: 16th International Society of Scientometrics and Informetrics Conference, 16-20 Oct 2017, Wuhan.

Full text available as:
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (426kB) | Preview
Google Scholar: Look up in Google Scholar


This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text. We offer a comparison of their individual similarities, strengths and weaknesses. We analyse a range of features that have been previously used in this task. Our experimental results confirm that the number of in-text references are highly predictive of influence. Contrary to the work of Valenzuela et al. (2015), we find abstract similarity one of the most predictive features. Overall, we show that many of the features previously described in literature have been either reported as not particularly predictive, cannot be reproduced based on their existing descriptions or should not be used due to their reliance on external changing evidence. Additionally, we find significant variance in the results provided by the PDF extraction tools used in the pre-processing stages of citation extraction. This has a direct and significant impact on the classification features that rely on this extraction process. Consequently, we discuss challenges and potential improvements in the classification pipeline, provide a critical review of the performance of individual features and address the importance of constructing a large-scale gold-standard reference dataset.

Item Type: Conference or Workshop Item
Copyright Holders: 2017 The Authors
Keywords: citation analysis; bibliometrics; scientometrics; NLP; semantometrics
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Big Scientific Data and Text Analytics Group (BSDTAG)
Related URLs:
Item ID: 51751
Depositing User: David Pride
Date Deposited: 25 Oct 2017 10:01
Last Modified: 02 May 2019 03:45
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU