%0 Journal Article %T Validation and functional annotation of expression-based clusters based on gene ontology %A Ralf Steuer %A Peter Humburg %A Joachim Selbig %J BMC Bioinformatics %D 2006 %I BioMed Central %R 10.1186/1471-2105-7-380 %X In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12:1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes.We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files.One of the common assertions in expression analysis is that genes sharing a similar pattern of expression are more likely to be involved in the same regulatory processes [1]. This proposition, commonly referred to as 'guilt-by-association', has been exploited by a large number of clustering algorithms, grouping genes into a (small) number of classes, based on the similarity of their expression profiles. While there are still many open problems associated with choosing a particular algorithm, clustering has already proven successful in a multitude of applications, such as the inference of putative functional annotations [2,3], as well as the extraction of regulatory motifs in the upstream regions of genes [4,5] %U http://www.biomedcentral.com/1471-2105/7/380