Sehgal, Shoaib; Gondal, Iqbal and Dooley, Laurence
(2006). Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction.
In: Not Set ed.
Data Mining for Biomedical Applications.
Lecture Notes in Computer Science, 3916.
Berlin: Springer Verlag, pp. 131–142.
Microarray data is used in a large number of applications ranging from diagnosis through to drug discovery. Such data however, often contains multiple missing genetic expressions which are generally ignored thus degrading the reliability of inferred results. This paper presents an innovative and robust imputation framework that more accurately estimates missing values leading subsequently to better gene selection and class prediction. To prove this premise, several missing value techniques including the Collateral Missing Values Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), k-Nearest Neighbour (KNN) and ZeroImpute are analysed. A combination of univariate and multiple gene selection methods, namely, Between Group to within Group Sum of Squares and Weighted Partial Least Squares is then performed before applying class prediction using the Ridge Partial Least Square method. Overall, CMVE imputation consistently provided superior missing values estimation accuracy compared with the other algorithms examined, by virtue of exploiting local and global as well as positive and negative correlations between genes, with all empirical results being corroborated by the two-sided Wilcoxon Rank sum statistical significance test.
Actions (login may be required)