Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task. 1. Introduction Biological systems are inherently stochastic, uncertain, and fuzzy [1]. Therefore, research in bioinformatics and computational biology, where computer technologies are applied to manage and analyze biological data and make computational models, is faced with a great deal of uncertainty. For instance, growth and development as well as environmental stresses can all contribute to change in gene expression levels. In addition, under such conditions, some genes influence the expression of other genes and their functionalities. With the advent of high-throughput technologies in transcriptomics, proteomics, and metabolomics, now, biologists have the ability to investigate the expression of genes and consequences on a genome-wide scale. Gene expression data in the form of high-throughput microarray experiments measure the amounts of RNA associated with each of thousands of genes in parallel. Time-series microarrays have attracted biologists’ interests for deciphering the dynamic and complex nature of biological networks. Time-series microarrays record multiple expression profiles at discrete time points (i.e., hours or days) of a continuous cellular process. Thus, analytical methods are needed to handle many genes with uncertain
References
[1]
P. Du, J. Gong, E. S. Wurtele, and J. A. Dickerson, “Modeling gene expression networks using fuzzy logic,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 35, no. 6, pp. 1351–1359, 2005.
[2]
M. K. Kerr and G. A. Churchill, “Experimental design for gene expression microarrays,” Biostatistics, vol. 2, no. 2, pp. 183–202, 2001.
[3]
R. Schmid, P. Baum, C. Ittrich et al., “Comparison of normalization methods for Illumina BeadChip HumanHT-12 v3,” BMC Genomics, vol. 11, no. 1, article 349, 2010.
[4]
J. Rudy and F. Valafar, “Empirical comparison of cross-platform normalization methods for gene expression data,” BMC Bioinformatics, vol. 12, no. 1, article 467, 2011.
[5]
A. W. C. Liew, N. F. Law, and H. Yan, “Missing value imputation for gene expression data: computational techniques to recover missing data from available information,” Briefings in Bioinformatics, vol. 12, no. 5, pp. 498–513, 2011.
[6]
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proceedings of the National Academy of Sciences of the United States of America, vol. 95, no. 25, pp. 14863–14868, 1998.
[7]
P. D'Haeseleer, “How does gene expression clustering work?” Nature Biotechnology, vol. 23, no. 12, pp. 1499–1501, 2005.
[8]
P. Xu, G. N. Brock, and R. S. Parrish, “Modified linear discriminant analysis approaches for classification of high-dimensional microarray data,” Computational Statistics and Data Analysis, vol. 53, no. 5, pp. 1674–1687, 2009.
[9]
Y. Tan and Y. Liu, “Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data,” Bioinformation, vol. 7, no. 8, pp. 400–404, 2011.
[10]
Y. Huang, I. Tienda Luna, and Y. Wang, “Reverse engineering gene regulatory networks: a survey of statistical models,” IEEE Signal Processing, vol. 59, no. 2, pp. 113–125, 2009.
[11]
X. Cai and X. Wang, “Stochastic modelling and simulation of gene networks—a review of the state-of-the-art research on stochastic simulations,” IEEE Signal Processing Transactions, vol. 24, no. 1, pp. 27–36, 2007.
[12]
J. T. Trevors, J. D. Elsas, and A. K. Bej, “The molecularly crowded cytoplasm of bacterial cells: dividing cells contrasted with viable but non-culturable (VBNC) bacterial cells,” Current Issues in Molecular Biology, vol. 15, no. 1, pp. 1–6, 2012.
[13]
M. Ding, S. Cui, C. Li et al., “Loss of the tumor suppressor Vhlh leads to upregulation of Cxcr4 and rapidly progressive glomerulonephritis in mice,” Nature Medicine, vol. 12, no. 9, pp. 1081–1087, 2006.
[14]
M. V. Karpuj, M. W. Becher, J. E. Springer et al., “Prolonged survival and decreased abnormal movements in transgenic model of Huntington disease, with administration of the transglutaminase inhibitor cystamine,” Nature Medicine, vol. 8, no. 2, pp. 143–149, 2002.
[15]
U. M. Braga-Neto, “Fads and fallacies in the name of small-sample microarray classification—a highlight of misunderstanding and erroneous usage in the applications of genomic signal processing,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 91–99, 2007.
[16]
J. Ernst, G. J. Nau, and Z. Bar-Joseph, “Clustering short time series gene expression data,” Bioinformatics, vol. 21, no. 1, pp. I159–I168, 2005.
[17]
S. Liang, S. Fuhrman, and R. Somogyi, “Reveal, a general reverse engineering algorithm for inference of genetic network architectures,” Pacific Symposium on Biocomputing, pp. 18–29, 1998.
[18]
T. Akutsu, S. Miyano, and S. Kuhara, “Identification of genetic networks from a small number of gene expression patterns under the Boolean network model,” Pacific Symposium on Biocomputing, pp. 17–28, 1999.
[19]
M. Chaves, E. D. Sontag, and R. Albert, “Methods of robustness analysis for Boolean models of gene control networks,” IEE Proceedings: Systems Biology, vol. 153, no. 4, pp. 154–167, 2006.
[20]
N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” Journal of Computational Biology, vol. 7, no. 3-4, pp. 601–620, 2000.
[21]
S. Y. Kim, S. Imoto, and S. Miyano, “Inferring gene networks from time series microarray data using dynamic Bayesian networks,” Briefings in Bioinformatics, vol. 4, no. 3, pp. 228–235, 2003.
[22]
M. Zou and S. D. Conzen, “A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data,” Bioinformatics, vol. 21, no. 1, pp. 71–79, 2005.
[23]
A. V. Werhli and D. Husmeier, “Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge,” Statistical Applications in Genetics and Molecular Biology, vol. 6, no. 1, article 15, 2007.
[24]
A. Schliep, A. Sch?nhuth, and C. Steinhoff, “Using hidden Markov models to analyze gene expression time course data,” Bioinformatics, vol. 19, supplement 1, pp. i255–i263, 2003.
[25]
H. Toh and K. Horimoto, “Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling,” Bioinformatics, vol. 18, no. 2, pp. 287–297, 2002.
[26]
M. Quach, N. Brunel, and F. D'alché-Buc, “Estimating parameters and hidden variables in non-linear state-space models based on ODEs for biological networks inference,” Bioinformatics, vol. 23, no. 23, pp. 3209–3216, 2007.
[27]
Z. Wang, F. Yang, D. W. C. Ho, S. Swift, A. Tucker, and X. Liu, “Stochastic dynamic modeling of short gene expression time-series data,” IEEE Transactions on Nanobioscience, vol. 7, no. 1, pp. 44–55, 2008.
[28]
L. Qian, H. Wang, and E. R. Dougherty, “Inference of noisy nonlinear differential equation models for gene regulatory networks using genetic programming and Kalman filtering,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3327–3339, 2008.
[29]
Z. Wang, X. Liu, Y. Liu, J. Liang, and V. Vinciotti, “An extended kalman filtering approach to modeling nonlinear dynamic gene regulatory networks via short gene expression time series,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 3, pp. 410–419, 2009.
[30]
A. Noor, E. Serpedin, M. Nounou, and H. Nounou, “Inferring gene regulatory networks via nonlinear state-space models and exploiting sparsity,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1203–1211, 2012.
[31]
N. H. Lee, “Genomic approaches for reconstructing gene networks,” Pharmacogenomics, vol. 6, no. 3, pp. 245–258, 2005.
[32]
S. Mehra, W. S. Hu, and G. Karypis, “A Boolean algorithm for reconstructing the structure of regulatory networks,” Metabolic Engineering, vol. 6, no. 4, pp. 326–339, 2004.
[33]
L. A. Soinov, M. A. Krestyaninova, and A. Brazma, “Towards reconstruction of gene networks from expression data by supervised learning,” Genome Biology, vol. 4, no. 1, article R6, 2003.
[34]
G. J. Hickman and T. C. Hodgman, “Inference of gene regulatory networks using Boolean-network inference methods,” Journal of Bioinformatics and Computational Biology, vol. 7, no. 6, pp. 1013–1029, 2009.
[35]
S. C. Madeira, M. C. Teixeira, I. Sá-Correia, and A. L. Oliveira, “Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 153–165, 2010.
[36]
Y. Xiao, “A tutorial on analysis and simulation of Boolean gene regulatory network models,” Current Genomics, vol. 10, no. 7, pp. 511–525, 2009.
[37]
H. Kim, J. K. Lee, and T. Park, “Boolean networks using the chi-square test for inferring large-scale gene regulatory networks,” BMC Bioinformatics, vol. 8, article 37, 2007.
[38]
K. P. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, Computer Science, University of California, Berkeley, Calif, USA, 2002.
[39]
L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
[40]
E. Ruspini, P. Bonissone, and W. Pedrycz, Handbook of Fuzzy Computation, Iop Pub/Inst of Physics, 1998.
[41]
E. Cox, The Fuzzy Systems Handbook, AP Professional, New York, NY, USA, 1994.
[42]
S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, New York, NY, USA, 1999.
[43]
K. Mehrotra, C. K. Mohan, and S. Ranka, Elements of Artificial Neural Networks, MIT Press, Boston, Mass, USA, 1997.
[44]
M. Kanehisa, S. Goto, M. Furumichi, M. Tanabe, and M. Hirakawa, “KEGG for representation and analysis of molecular networks involving diseases and drugs,” Nucleic Acids Research, vol. 38, no. 1, pp. D355–D360, 2009.
[45]
M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, and M. Tanabe, “KEGG for integration and interpretation of large-scale molecular datasets,” Nucleic Acids Research, vol. 40, no. 1, pp. D109–D114, 2012.
[46]
M. C. Costanzo, J. D. Hogan, M. E. Cusick et al., “The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information,” Nucleic Acids Research, vol. 28, no. 1, pp. 73–76, 2000.
[47]
P. T. Spellman, G. Sherlock, M. Q. Zhang et al., “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273–3297, 1998.
[48]
R. J. Cho, M. J. Campbell, E. A. Winzeler et al., “A genome-wide transcriptional analysis of the mitotic cell cycle,” Molecular Cell, vol. 2, no. 1, pp. 65–73, 1998.
[49]
L. Zeng, J. Wu, and J. Xie, “Statistical methods in integrative analysis for gene regulatory modules,” Statistical Applications in Genetics and Molecular Biology, vol. 7, no. 1, article 28, 2008.
[50]
M. S. Shotwell and E. H. Slate, “Bayesian outlier detectionwith dirichlet process mixtures,” Bayesian Analysis, vol. 6, no. 4, pp. 665–690, 2011.
[51]
B. Grün, T. Scharl, and F. Leisch, “Modelling time course gene expression data with finite mixtures of linear additive models,” Bioinformatics, vol. 28, no. 2, pp. 222–228, 2012.
[52]
L. P. Tian, L. Z. Liu, Q. W. Zhang, and F. X. Wu, “Nonlinear model-based method for clustering periodically expressed genes,” The Scientific World Journal, vol. 11, pp. 2051–2061, 2011.
[53]
M. B?ck, S. Ogishima, H. Tanaka, S. Kramer, and L. Kaderali, “Hub-centered gene network reconstruction using automatic relevance determination,” PLoS ONE, vol. 7, no. 5, Article ID e35077, 2012.
[54]
O. Troyanskaya, M. Cantor, G. Sherlock et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001.
[55]
T. I. Lee, N. J. Rinaldi, F. Robert, et al., “Transcriptional regulatory networks in Saccharomyces cerevisiae,” Science, vol. 298, no. 5594, pp. 763–764, 2002.
[56]
http://www.genome.jp/kegg/.
[57]
S. Kim, S. Imoto, and S. Miyano, “Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data,” BioSystems, vol. 75, no. 1–3, pp. 57–65, 2004.
[58]
I. M. Tienda-Luna, M. C. C. Perez, D. P. R. Padillo, Y. Yin, and Y. Huang, “Sensitivity and specificity of inferring genetic regulatory interactions with the VBEM algorithm,” IADIS International Journal on Computer Science and Information Systems, vol. 4, no. 1, pp. 54–63, 2009.
[59]
P. Zoppoli, S. Morganella, and M. Ceccarelli, “TimeDelay-ARACNE: reverse engineering of gene networks from time-course data by an information theoretic approach,” BMC Bioinformatics, vol. 11, article 154, 2010.
[60]
F. Emmert-Streib, “Statistical inference and reverse engineering of gene regulatory networks from observational expression data,” Frontiers in Genetics, vol. 3, article 8, 2012.