|
BMC Bioinformatics 2007
Evaluation of gene-expression clustering via mutual information distance measureAbstract: Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions.In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.In recent years, DNA microarray technology has become a vital scientific tool for global analysis of genes and their networks. The new technology allows simultaneous profiling of the expression levels of thousands of genes in a single experiment. At the same time, the successful implementation of microarray technology has required new methods for analyzing such large scale datasets. Clustering is a central analysis method of gene-expressions that has been implemented extensively in various works and applications [1-5]. The primary goal is to cluster together genes or tissues that manifest similar expression patterns [1]. The underlying assumption is that co-expressed genes or tissues with correlated pathways may share common functional tasks and regulatory mechanisms. Similar expression patterns might offer insights into various transcriptional and biological processes [6-8].Many clustering algorithms depend heavily on 'similarity' or 'distance' measures (although not necessarily a distance function that satisfy all mathematical conditions of a metric) that quantify the degree of association between expression profiles. The definition of the distance measure is a key factor for a successful identificati
|