|
BMC Bioinformatics 2010
Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networksAbstract: In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies.Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.Inferring gene regulatory networks (GRN) using time course microarray data is one of the most important goals in systems biology [1]. A number of algorithms have been proposed to infer the transcription networks, including Boolean Networks [2,3], Gaussian Networks [4], Bayesian Networks [5,6], and Dynamic Bayesian Networks [7]. Most algorithms require discrete data as input. However, the selection of the discretization method is often arbitrary due to the lack of empirical data about the performance of different discretization methods. Discretization methods based on transitions between time points obtain better results than those using absolute values for biclustering time series gene expression data [8]. We proposed therefore that some discretization methods will produce superior results than others when inferring GRN.Many discretization methods commonly used in data mining and knowledge discovery have been also used to discretize time series gene expression data (see [8] for review). However, most of these methods are not suitable to be used during preprocessing in time course microarray data analysis, and more specifically they are not suitable, or perform poorly, when used to discretize gene expression data during the process of GRN inference. Discretization algorithms can be divided into two categories: supervised and unsupervised. S
|