Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations

Dragomir, Ionut; Akbar, Adnan; Cassidy, John W.; Patel, Nirmesh; Clifford, Harry W. and Contino, Gianmarco (2021). Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations. Cancers, 13(11), article no. e2779.



Sporadic cancer develops from the accrual of somatic mutations. Out of all small-scale somatic aberrations in coding regions, 95% are base substitutions, with 90% being missense mutations. While multiple studies focused on the importance of this mutation type, a machine learning method based on the number of protein–protein interactions (PPIs) has not been fully explored. This study aims to develop an improved computational method for driver identification, validation and evaluation (DRIVE), which is compared to other methods for assessing its performance. DRIVE aims at distinguishing between driver and passenger mutations using a feature-based learning approach comprising two levels of biological classification for a pan-cancer assessment of somatic mutations. Gene-level features include the maximum number of protein–protein interactions, the biological process and the type of post-translational modifications (PTMs) while mutation-level features are based on pathogenicity scores. Multiple supervised classification algorithms were trained on Genomics Evidence Neoplasia Information Exchange (GENIE) project data and then tested on an independent dataset from The Cancer Genome Atlas (TCGA) study. Finally, the most powerful classifier using DRIVE was evaluated on a benchmark dataset, which showed a better overall performance compared to other state-of-the-art methodologies, however, considerable care must be taken due to the reduced size of the dataset. DRIVE outlines the outstanding potential that multiple levels of a feature-based learning model will play in the future of oncology-based precision medicine.

Viewing alternatives

Download history


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions



  • Item ORO ID
  • 77724
  • Item Type
  • Journal Item
  • ISSN
  • 2072-6694
  • Keywords
  • pan-cancer, classification, decision tree, driver mutation, extreme gradient boosting, k-nearest neighbours, logistic regression, multilayer perceptron, random forest, support vector machines
  • Academic Unit or School
  • Other Departments > Other Departments
  • Copyright Holders
  • © 2021 Ionut Dragomir, © 2021 Adnan Akbar, © 2021 John W. Cassidy, © 2021 Nirmesh Patel, © 2021 Harry W. Clifford, © 2021 Gianmarco Contino
  • SWORD Depositor
  • Jisc Publications-Router
  • Depositing User
  • Jisc Publications-Router