The Open UniversitySkip to content

Improving the tokenisation of identifier names

Butler, Simon; Wermelinger, Michel; Yu, Yijun and Sharp, Helen (2011). Improving the tokenisation of identifier names. In: ECOOP 2011 – Object-Oriented Programming (Mira, Mezini ed.), Lecture Notes in Computer Science, Springer Verlag, pp. 130–154.

Full text available as:
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (315kB)
DOI (Digital Object Identifier) Link:
Google Scholar: Look up in Google Scholar


Identifier names are the main vehicle for semantic information during program comprehension. For tool-supported program comprehension tasks, including concept location and requirements traceability, identifier names need to be tokenised into their semantic constituents. In this paper we present an approach to the automated tokenisation of identifier names that improves on existing techniques in two ways. First, it improves the tokenisation accuracy for single-case identifier names and for identifier names containing digits, which existing techniques largely ignore. Second, performance gains over existing techniques are achieved using smaller oracles, making the approach easier to deploy.

Accuracy was evaluated by comparing our algorithm to manual tokenizations of 28,000 identifier names drawn from 60 well-known open source Java projects totalling 16.5 MSLOC. Moreover, the projects were used to perform a study of identifier tokenisation features (single case, camel case, use of digits, etc.) per object-oriented construct (class names, method names, local variable names, etc.), thus providing an insight into naming conventions in industrial-scale object-oriented code. Our tokenisation tool and datasets are publicly available.

Item Type: Conference or Workshop Item
Copyright Holders: 2011 Springer Verlag
ISBN: 3-642-22654-X, 978-3-642-22654-0
Extra Information: The software and datasets described in this paper are available from the related URL given below.
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Research Group: Centre for Research in Computing (CRC)
Related URLs:
Item ID: 25656
Depositing User: Michel Wermelinger
Date Deposited: 18 Mar 2011 11:08
Last Modified: 07 Dec 2018 18:22
Share this page:


Altmetrics from Altmetric

Citations from Dimensions

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU