Copy the page URI to the clipboard
Pride, David and Knoth, Petr
(2017).
Abstract
This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text. We offer a comparison of their individual similarities, strengths and weaknesses. We analyse a range of features that have been previously used in this task. Our experimental results confirm that the number of in-text references are highly predictive of influence. Contrary to the work of Valenzuela et al. (2015), we find abstract similarity one of the most predictive features. Overall, we show that many of the features previously described in literature have been either reported as not particularly predictive, cannot be reproduced based on their existing descriptions or should not be used due to their reliance on external changing evidence. Additionally, we find significant variance in the results provided by the PDF extraction tools used in the pre-processing stages of citation extraction. This has a direct and significant impact on the classification features that rely on this extraction process. Consequently, we discuss challenges and potential improvements in the classification pipeline, provide a critical review of the performance of individual features and address the importance of constructing a large-scale gold-standard reference dataset.
Viewing alternatives
Download history
Item Actions
Export
About
- Item ORO ID
- 51751
- Item Type
- Conference or Workshop Item
- Keywords
- citation analysis; bibliometrics; scientometrics; NLP; semantometrics
- Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
-
Centre for Research in Computing (CRC)
Big Scientific Data and Text Analytics Group (BSDTAG) - Copyright Holders
- © 2017 The Authors
- Related URLs
-
- http://www.issi2017.org/(Other)
- Depositing User
- David Pride