|
BMC Bioinformatics 2008
Gene function prediction using labeled and unlabeled dataAbstract: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes.We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method.One of the main goals in post-genomic era is to predict the biological functions of genes. Recently, with the rapid advance in high-throughput biotechnologies, such as yeast two-hybrid systems [1], protein complex [2,3] and microarray expression profiles [4], a large amount of biological data have been generated. These data are rich sources for deducing and understanding gene functions. For example, protein-protein interaction data are widely exploited for inferring functions of genes with the assumption that interacting proteins have the same or similar functions, i.e. "guilty by association" rule [5-10]. In addition, gene expression data have been widely used for gene function prediction, where genes with similar expression patterns are assumed to have similar functions [11]. In the literature, it has been shown that integration of different kinds of data sources can considerably improve prediction results [12-15]. With various kinds of high-throughput data, the machine learning techniques, especially Support Vector Machines (SVMs), have been used for predicting gene
|