Copy the page URI to the clipboard
Berrar, Daniel
(2016).
DOI: https://doi.org/10.1007/978-3-319-46672-9_6
Abstract
Performance measures play a pivotal role in the evaluation and selection of machine learning models for a wide range of applications. Using both synthetic and real-world data sets, we investigated the resilience to noise of various ranking measures. Our experiments revealed that the area under the ROC curve AUC and a related measure, the truncated average Kolmogorov-Smirnov statistic taKS, can reliably discriminate between models with truly different performance under various types and levels of noise. With increasing class skew, however, the H-measure and estimators of the area under the precision-recall curve become preferable measures. Because of its simple graphical interpretation and robustness, the lower trapezoid estimator of the area under the precision-recall curve is recommended for highly imbalanced data sets.