Sehgal, Shoaib; Gondal, Iqbal and Dooley, Laurence S.
|Google Scholar:||Look up in Google Scholar|
Microarray data are used in a range of application areas in biology, from diagnosis through to drug discovery; however such data often contains multiple missing genetic expression values that degrade the performance of statistical and machine learning algorithms. This paper presents a new k-Ranked Covariance-based Missing Value Imputation (KRCOV) algorithm which demonstrates superior imputation performance compared to the popular k-Nearest Neighbour (KNN) technique in estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian and breast cancer. By exploiting the strong correlation between samples, KRCOV consistently outperforms in terms of estimation error, significance test and classification accuracy, KNN and zero-imputation techniques in approximating randomly occurring missing values in the range 1% to 5%. The Generalized Regression Neural Network (GRNN) classifier is applied as it repeatedly provides improved classification performance for ovarian and breast cancer microarray data. The theoretical foundations of KRCOV are presented and a self-correcting error property investigated that guarantees the new algorithm generates a lower error compared with KNN, when estimating randomly introduced missing values, for the same order of computational complexity.
|Item Type:||Journal Article|
|Academic Unit/Department:||Mathematics, Computing and Technology > Computing & Communications|
|Interdisciplinary Research Centre:||Centre for Research in Computing (CRC)|
|Depositing User:||Laurence Dooley|
|Date Deposited:||10 Apr 2008|
|Last Modified:||02 Dec 2010 20:07|
|Share this page:|