|
Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional predictionAbstract: In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps) within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure.We assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case) and the bias resulting from the choice of the classification method are examined both separately and jointly.The median minimal error rate over the investigated classifiers was as low as 31% and 41% based on permuted uninformative predictors from studies on colon cancer and prostate cancer, respectively. We conclude that the strategy to present only the optimal result is not acceptable because it yields a substantial bias in error rate estimation, and suggest alternative approaches for properly reporting classification accuracy.It is well-known that almost all published studies present positive research results, as outlined by Kyzas et al [1] for the special case of prostate cancer. In the case of microarray studies, that often focus on the identification of differentially expressed genes or the construction of outcome prediction rules, this means that almost all studies report at least a few significant differentially expressed genes or a small prediction error, respectively.According to Ioannidis [2], " [...] most published research findings are wrong". This may be partly due to the editorial policy of many journals which accept almost only papers presenting positive research results (except perhaps recent initiatives like the Journal of Negative Research Results in Medicine). Aut
|