Abstract:
In this paper, we explore the performance of five factor analysis algorithms, Bayesian as well as classical, on problems with biological context using both simulated and real data. Factor analysis (FA) models are used in order to describe a larger number of observed variables by a smaller number of unobserved variables, the factors, whereby all correlation between observed variables is explained by common factors. Bayesian FA methods allow one to infer sparse networks by enforcing sparsity through priors. In contrast, in the classical FA, matrix rotation methods are used to enforce sparsity and thus to increase the interpretability of the inferred factor loadings matrix. However, we also show that Bayesian FA models that do not impose sparsity through the priors can still be used for the reconstruction of a gene regulatory network if applied in conjunction with matrix rotation methods. Finally, we show the added advantage of merging the information derived from all algorithms in order to obtain a combined result.Most of the algorithms tested are successful in reconstructing the connectivity structure as well as the TF profiles. Moreover, we demonstrate that if the underlying network is sparse it is still possible to reconstruct hidden activity profiles of TFs to some degree without prior connectivity information.Factor analysis (FA) as well as principal component analysis (PCA) is used to describe a number of observed variables by a smaller number of unobserved variables. Unlike PCA, FA also includes independent additive measurement errors on the observed variables. FA assumes that the observed variables become uncorrelated given a set of hidden variables called factors. It can also be seen as a clustering method where the variables described by the same factors are highly correlated, thus belonging to the same cluster, while the variables depending on different factors are uncorrelated and placed in different clusters.FA has been successfully used in a number of ar

Abstract:
Genes are transcribed into mRNAs which in turn are translated into proteins. Some of these proteins activate or inhibit, as transcription factors (TFs), the transcription of a number of other genes creating a complex gene regulatory network. The number of transcription factors is believed to be much smaller than the number of regulated genes. Moreover, most genes are known to be regulated only by a very restricted number of transcription factors. This induces a sparse connectivity matrix for the representation of the connections between the TFs and the regulated genes. Microarray experiments measure the expression level of thousands of genes simultaneously. Unfortunately, a similar method that would allow us to measure simultaneously the abundance or activities of a larger number of proteins that act as TFs is not yet available. Some progress has been made with measurements of protein abundance by flow cytometry [1] following a dozen or so proteins of interest which need to be identified in advance. Still, such experiments are less available than gene expression experiments and cannot compete in terms of the number of tracked genes. ChIP-on-chip experiments, on the other hand, provide only static binding information about transcription factors. Thus, current approaches that use microarray experiments make a strong assumption: the protein levels are proportional to the mRNA levels. This assumption is not necessarily true due to the complexity of transcription, translation, and posttranslation modification. In more recent studies, two-level networks have been studied with hidden profiles of the transcription factors at the top level and the observed expression levels of the regulated genes at the lower level. Some of these studies [2–4] are concerned with factor analysis algorithms.Factor analysis (FA) is often used as a dimensionality reduction approach assuming that the large number of observed variables becomes uncorrelated given a much smaller number of hidden var

Abstract:
Two-level gene regulatory networks consist of the transcription factors (TFs) in the top level and their regulated genes in the second level. The expression profiles of the regulated genes are the observed high-throughput data given by experiments such as microarrays. The activity profiles of the TFs are treated as hidden variables as well as the connectivity matrix that indicates the regulatory relationships of TFs with their regulated genes. Factor analysis (FA) as well as other methods, such as the network component algorithm, has been suggested for reconstructing gene regulatory networks and also for predicting TF activities. They have been applied to E. coli and yeast data with the assumption that these datasets consist of identical and independently distributed samples. Thus, the main drawback of these algorithms is that they ignore any time correlation existing within the TF profiles. In this paper, we extend previously studied FA algorithms to include time correlation within the transcription factors. At the same time, we consider connectivity matrices that are sparse in order to capture the existing sparsity present in gene regulatory networks. The TFs activity profiles obtained by this approach are significantly smoother than profiles from previous FA algorithms. The periodicities in profiles from yeast expression data become prominent in our reconstruction. Moreover, the strength of the correlation between time points is estimated and can be used to assess the suitability of the experimental time interval.

Abstract:
We focus on four commonly occurring network motif structures and show that it is possible to differentiate between them using simulated data and any of the model comparison methods tested. We expand one of the motifs, the feed forward (FF) motif, for several possible parameterizations and apply model selection on simulated data. We then use experimental data on three biosynthetic pathways in Escherichia coli to formally assess how current knowledge matches the time series available. Our analysis confirms two of them as FF motifs. Only an expanded set of FF motif parameterisations using time delays is able to fit the third pathway, indicating that the true mechanism might be more complex in this case.Maximum likelihood as well as Bayesian model comparison methods are suitable for selecting a plausible motif model among a set of candidate models. Our work shows that it is practical to apply model comparison to test ideas about underlying mechanisms of biological pathways in a formal and quantitative way.Cellular processes are very complex, but it seems that such processes can often be broken down into a small number of reoccuring patterns of interconnections known as network motifs [1,2]. Interestingly, some motifs are known to display specific dynamic functional roles [3,4]. Motif dynamics can now be assessed in a precise manner thanks to the emergence of new experimental techniques that allow generating high quality time series data with a high temporal sampling rate [5-9]. However, studying biological systems in general involves two steps: first, the components of the network need to be identified, and then the type of relationships between them established. Different methods exist for doing so. While some have focused on deriving pairs of possible interacting molecules from existing databases [1], others have tried to reconstruct networks from scratch integrating different sources of both static and dynamic data [10]. In fact, automatic identification of interacti