%0 Journal Article %T A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets %A Carmen Lai %A Marcel JT Reinders %A Laura J van't Veer %A Lodewyk FA Wessels %J BMC Bioinformatics %D 2006 %I BioMed Central %R 10.1186/1471-2105-7-235 %X In this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a r£¿nge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types.Our experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining.Gene expression microarrays enable the measurement of the activity levels of thousands of genes on a single glass slide. The number of genes (features) is in the order of thousands while the number of arrays is usually limited to several hundreds, due to the high cost associated with the procedure and the sample availability. In classification tasks a reduction of the feature space is usually performed [1,2]. On the one hand it decreases the complexity of the classification task and thus improves the classification Performance [3-7]. This is especially true when the classifiers employed are sensitive to noise. On the other hand it identifies relevant genes that can be potential biomarkers for the problem under study, and can be used in the clinic or for further studies, e.g. as targets for new types of therapies.A widely used search strategy employs a criterion to evaluate the informativeness of each gene individually. We refer to this approach as univariate gene selection. Several criteria have been proposed in the literature, e.g. Golub et al. [8] introduced the signal-to-noise-ratio (SNR), also employed in [9,10]. Bendor et %U http://www.biomedcentral.com/1471-2105/7/235