%0 Journal Article
%T Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
%A Youting Sun
%A Ulisses Braga-Neto
%A EdwardR Dougherty
%J EURASIP Journal on Bioinformatics and Systems Biology
%D 2010
%I BioMed Central
%R 10.1155/2009/504069
%X Microarray data frequently contain missing values (MVs) because imperfections in data preparation steps (e.g., poor hybridization, chip contamination by dust and scratches) create erroneous and low-quality values, which are usually discarded and referred to as missing. It is common for gene expression data to contain at least 5% MVs and, in many public accessible datasets, more than 60% of the genes have MVs [1]. Microarray gene expression data are usually organized in a matrix form with rows corresponding to the gene probes and columns representing the arrays. Trivial methods to deal with MVs in the microarray data matrix include replacing the MV by zero (given the data being in log domain) or by row average (RAVG). These methods do not make use of the underlying correlation structure of the data and thus often perform poorly in terms of estimation accuracy. Better imputation techniques have been developed to estimate the MVs by exploiting the observed data structure and expression pattern. These methods include K-nearest Neighbor imputation (KNNimpute) and singular value decomposition- (SVD-) based imputation [2], Bayesian principal components analysis (BPCA) [3], least square regression-based imputation [4], local least squares imputation (LLS) [5], and LinCmb imputation [6], in which the MV is calculated by a convex combination of the estimates given by several existing imputation methods, namely, RAVG, KNNimpute, SVD, and BPCA. In addition, a nonlinear PCA imputation based on neural networks was proposed for effectively dealing with nonlinearly structured microarray data [7]. Gene ontology-based imputation utilizes information on functional similarities to facilitate the selection of relevant genes for MV estimation [8]. Integrative MV estimation method (iMISS) aims at improving the MV estimation for datasets with limited numbers of samples by incorporating information from multiple microarray datasets [9].In most of the studies about MV imputation, the perform
%U http://bsb.eurasipjournals.com/content/2009/1/504069