|
BMC Systems Biology 2010
Searching for functional gene modules with interaction component modelsAbstract: We formulate a generative probabilistic model for protein-protein interaction links and introduce two ways for including gene expression data into the model. The model finds interaction components, which can be interpreted as overlapping clusters or functional modules. We demonstrate the performance on two data sets of yeast Saccharomyces cerevisiae. Our methods outperform a representative set of earlier models in the task of finding biologically relevant modules having enriched functional classes.Combining protein interaction and gene expression data with a probabilistic generative model improves discovery of modules compared to approaches based on either data source alone. With a fairly simple model we can find biologically relevant modules better than with alternative methods, and in addition the modules may be inherently overlapping in the sense that different interactions may belong to different modules.Searching for hypotheses about functional gene modules, co-regulated sets of genes and protein complexes, has been under intensive research effort given the current high-throughput data acquisition methods. Traditionally only a single data type, gene expression or protein-protein interaction (PPI) data is used (see for example [1,2]). Recently also methods for combining relational interaction data and functional gene expression data have been studied, for example [3,4].Ulitsky and Shamir [5] recently used similarities between gene expression patterns as a kind of interaction data between proteins. They combined these interactions with protein-protein interaction measurements in order to seek Jointly Active Connected Subnetworks (JACS). Their novel computational method called Matisse found biologically relevant modules better than a set of earlier methods (e.g. Co-clustering [6] and CLICK [7]).Another recent method [8] uses a protein-protein interaction network to form prior constraints on the clustering of gene expression data. The method is an extension of Mark
|