%0 Journal Article
%T Prediction of enzyme function by combining sequence similarity and protein interactions
%A Jordi Espadaler
%A Narayanan Eswar
%A Enrique Querol
%A Francesc X Avilés
%A Andrej Sali
%A Marc A Marti-Renom
%A Baldomero Oliva
%J BMC Bioinformatics
%D 2008
%I BioMed Central
%R 10.1186/1471-2105-9-249
%X The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST.Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.While the amount of genome sequence information is increasing exponentially, the annotation of protein sequences remains a problem, both in terms of quality and quantity [1]. Bioinformatics-based annotation of uncharacterized proteins is still one of the most challenging problems in biology [2]. The classical approach involves transfer of annotation from a functionally characterized protein to its functionally uncharacterized homologs. Although, several studies have highlighted the limitations of such methods[1,3,4], they have been extensively used on annotating proteins and in particular enzymes [5,6].About half of all proteins with experimentally characterized functions have enzymatic activity, making enzymes the largest single class of proteins [5]. The Enzyme Commission (EC) uses four numbers (integers) separated by periods to classify the functions of enzymes [7]. The first three digits describe the overall type of an enzymatic reaction, while the last digit represents the substrate specificity of the catalyzed reaction. The accuracy of transferring an enzymatic annotation between two globally aligned protein sequences has been reported to significantly drop under 60% sequence identity [6]. To address this limitation, we introduce for first time an approach that combines sequence similarity search and comparative protein interaction data to increase the c
%U http://www.biomedcentral.com/1471-2105/9/249