%0 Journal Article %T Conditional variable importance for random forests %A Carolin Strobl %A Anne-Laure Boulesteix %A Thomas Kneib %A Thomas Augustin %A Achim Zeileis %J BMC Bioinformatics %D 2008 %I BioMed Central %R 10.1186/1471-2105-9-307 %X We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure.The resulting conditional variable importance reflects the true impact of each predictor variable more reliably than the original marginal approach.Within the past few years, random forests [1] have become a popular and widely-used tool for non-parametric regression in many scientific areas. They show high predictive accuracy and are applicable even in high-dimensional problems with highly correlated variables, a situation which often occurs in bioinformatics. Recently, the variable importance measures yielded by random forests have also been suggested for the selection of relevant predictor variables in the analysis of microarray data, DNA sequencing and other applications [2-5].Identifying relevant predictor variables, rather than only predicting the response by means of some "black-box" model, is of interest in many applications. By means of variable importance measures the candidate predictor variables can be compared with respect to their impact in predicting the response or even their causal effect (see, e.g., [6] for assumptions necessary for interpreting the importance of a variable as a causal effect). In this case a key advantage of random forest variable importance measures, as compared to univariate screening methods, is that they cover the impact of each predictor variable individually as well as in multivariate interactions with other predictor variables. For example, Lunetta et al. [2] find that genetic markers relevant in interactions with other markers or environmental variables can be dete %U http://www.biomedcentral.com/1471-2105/9/307