|
BMC Systems Biology 2009
From gene expression to gene regulatory networks in Arabidopsis thalianaAbstract: A novel Bayesian network-based algorithm to infer gene regulatory networks from gene expression data is introduced and applied to learn parts of the transcriptomic network in Arabidopsis thaliana from a large number (thousands) of separate microarray experiments. Starting from an initial set of genes of interest, a network is grown by iterative addition to the model of the gene, from another defined set of genes, which gives the 'best' learned network structure. The gene set for iterative growth can be as large as the entire genome. A number of networks are inferred and analysed; these show (i) an agreement with the current literature on the circadian clock network, (ii) the ability to model other networks, and (iii) that the learned network hypotheses can suggest new roles for poorly characterized genes, through addition of relevant genes from an unconstrained list of over 15,000 possible genes. To demonstrate the latter point, the method is used to suggest that particular GATA transcription factors are regulators of photosynthetic genes. Additionally, the performance in recovering a known network from different amounts of synthetically generated data is evaluated.Our results show that plausible regulatory networks can be learned from such gene expression data alone. This work demonstrates that network hypotheses can be generated from existing gene expression data for use by experimental biologists.Much of molecular biology aims to decipher the mechanisms organisms use to modulate their gene expression patterns. This has been greatly facilitated by genome sequencing and subsequent design of microarrays allowing determination of gene expression patterns with near full-genome coverage. While individual array experiments can be examined for differential expression of genes of interest, this may be misleading owing to irreproducibility, and only uses a small fraction of the data often available. Further analysis of large amounts of microarray data en masse to calculate
|