|
BMC Bioinformatics 2007
Simcluster: clustering enumeration gene expression data on the simplex spaceAbstract: Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster webcite.Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.Technologies for high-throughput measurement of transcriptional gene expression are mainly divided into two categories: those based on hybridization, such as all microarray-related technologies [1,2] and those based on transcript enumeration, which include SAGE [3], MPSS [4], and Digital Northern powered by traditional [5] or, recently developed, EST sequencing-by-synthesis (SBS) technologies [6].Currently, transcript enumeration methods are relatively expensive and more time-consuming than methods based on hybridization. However, recent improvements in sequencing technology, powered by the "$1000 genome" effort [7], promises to transform the transcript enumeration approach into a fast and accessible alternative [8-10] paving the way for a systems-level absolute digital description of individualized samples [11].Methods for finding differentially expressed genes have been developed specifically in the context of enumeration-based techniques of different sequencing scales such as EST [12], SAGE [13] and MPSS [14]. However, in spite of their differences, hybridization-based and enumeration-based data are typically analyzed using the same pattern recognition techniques, which are generally imported from the microarray analysis field.In the case of clustering analysis of gene profiles, the simple appropriation of practices from the microarray analysis field has been shown to lead to suboptimal pe
|