|
Genome Biology 2006
PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classificationAbstract: Computational methods for protein function prediction have been critical in the post-genome era in the functional annotation of literally millions of novel sequences. The standard protocol for sequence functional annotation - transferring the annotation of a database hit to a sequence 'query' based on predicted homology - has been shown to be prone to systematic error [1-3]. The top hit in a sequence database may have a different function to the query due to neofunctionalization stemming from gene duplication [4], differences in domain structure [5,6], mutations at key functional positions, or speciation [1]. Annotation errors have been shown to propagate through databases by the application of homology-based annotation transfer [7-9]. While the exact frequency of annotation error is unknown (one published estimate is 8% or higher [7]), the importance of detecting and correcting existing errors and preventing future errors is undisputed.An additional complicating factor in annotation transfer by homology is the complete failure of this approach for an average of 30% of the genes in most genomes sequenced: in some cases no homologs can be detected within a particular significance threshold, for instance, a BLAST [10] expectation (E) value (that is, the number of hits receiving a given score expected by chance alone in the database searched) of 0.001 or less, while in other cases database hits may be labeled as 'hypothetical' or 'unknown'.With the huge array of bioinformatics software tools and resources available, it might seem unthinkable that functional annotation accuracy would be so difficult to ensure. Rather like the parable of the blind men and the elephant, each tool used separately provides a partial and imperfect picture; taken as a whole, the probable molecular function of the protein, biological process, cellular component, interacting partners, and other aspects of a protein's function can often come into better focus. For instance, annotation transfer f
|