|
A shortest-path graph kernel for estimating gene product semantic similarityAbstract: We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used.Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.The Gene Ontology (GO) [1] systematically organizes knowledge by means of well-structured controlled vocabularies and provides consistent descriptions to organisms across species. GO terms have been widely used to annotate genes and gene products in the Gene Ontology Annotation (GOA) project [2]. As the GO becomes more and more important in biomedical research, computational methods are often needed to explore the GO to calculate the semantic similarity between gene products. Such methods have been used in a broad range of applications, including: clustering of genes in pathways [3-6], prediction of protein-protein interactions [7], and the evaluation of similarity between gene products with respect to expression profiles [8], protein sequence [9-11], protein function [12], and protein family [13].The semantic similarity between two gene products is usually calculated based on the term similarity. First, pair
|