Sehgal, Shoaib; Gondal, Iqbal and Dooley, Laurence S.
Missing values imputation for cDNA microarray data using ranked covariance vectors.
International Journal of Hybrid Intelligent Systems, 2(4),
Microarray data are used in a range of application areas in biology, from diagnosis through to drug discovery; however such data often contains multiple missing genetic expression values that degrade the performance of statistical and machine learning algorithms. This paper presents a new k-Ranked Covariance-based Missing Value Imputation (KRCOV) algorithm which demonstrates superior imputation performance compared to the popular k-Nearest Neighbour (KNN) technique in estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian and breast cancer. By exploiting the strong correlation between samples, KRCOV consistently outperforms in terms of estimation error, significance test and classification accuracy, KNN and zero-imputation techniques in approximating randomly occurring missing values in the range 1% to 5%. The Generalized Regression Neural Network (GRNN) classifier is applied as it repeatedly provides improved classification performance for ovarian and breast cancer microarray data. The theoretical foundations of KRCOV are presented and a self-correcting error property investigated that guarantees the new algorithm generates a lower error compared with KNN, when estimating randomly introduced missing values, for the same order of computational complexity.
Actions (login may be required)