All Title Author
Keywords Abstract

Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

DOI: 10.1186/1471-2105-13-54

Full-Text   Cite this paper   Add to My Lib


We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.Clustering of co-expressed genes has been an active data mining topic and advanced in parallel with the development of microarray technology [1]. There is a vast amount of literature on clustering algorithms developed for microarray data analysis [1]. Microarray gene expression data can be classified into two categories: steady state and time-series gene expression data [2]. Time-series gene expression data are widely used to study the dynamic behaviour of various biological processes in the cell [3-5]. They can be classified into two categories (relative to the clustering algorithms design for their analysis): short time-series corresponding to 3-8 time points [6], and long time-series corresponding to more than 8 time points. Short time-se


comments powered by Disqus