The Open UniversitySkip to content
 

Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification

Bui, Nghi; Yu, Yijun and Jiang, Lingxiao (2019). Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification. In: The 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (Lo, David and Shihab, Emad eds.), 24-27 Feb 2019, Hangzhou, China, IEEE Computer Society.

Full text available as:
[img]
Preview
PDF (Accepted Manuscript) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (818kB) | Preview
Google Scholar: Look up in Google Scholar

Abstract

Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language.

To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages.

We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures.

Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated.

Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks.

Item Type: Conference or Workshop Item
Copyright Holders: 2018 IEEE
Project Funding Details:
Funded Project NameProject IDFunding Body
SAUSE: Secure, Adaptive, Usable Software EngineeringEP/R013144/1 (previous: EP/R005095/1)EPSRC (Engineering and Physical Sciences Research Council)
Keywords: cross-language mapping; program classification; algorithm classification; code embedding; code dependency; neural networks; bilateral neural networks
Academic Unit/School: Faculty of Science, Technology, Engineering and Mathematics (STEM) > Computing and Communications
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Item ID: 58410
Depositing User: Yijun Yu
Date Deposited: 21 Dec 2018 15:21
Last Modified: 16 Jul 2019 12:06
URI: http://oro.open.ac.uk/id/eprint/58410
Share this page:

Download history for this item

These details should be considered as only a guide to the number of downloads performed manually. Algorithmic methods have been applied in an attempt to remove automated downloads from the displayed statistics but no guarantee can be made as to the accuracy of the figures.

Actions (login may be required)

Policies | Disclaimer

© The Open University   contact the OU