|
BMC Bioinformatics 2012
BLANNOTATOR: enhanced homology-based function prediction of bacterial proteinsAbstract: We present an automated method for the functional annotation of bacterial protein sequences. Based on sequence similarity searches, BLANNOTATOR accurately annotates query sequences with one-line summary descriptions of protein function. It groups sequences identified by BLAST into subsets according to their annotation and bases its prediction on a set of sequences with consistent functional information. We show the results of BLANNOTATOR's performance in sets of bacterial proteins with known functions. We simulated the annotation process for 3090 SWISS-PROT proteins using a database in its state preceding the functional characterisation of the query protein. For this dataset, our method outperformed the five others that we tested, and the improved performance was maintained even in the absence of highly related sequence hits. We further demonstrate the value of our tool by analysing the putative proteome of Lactobacillus crispatus strain ST1.BLANNOTATOR is an accurate method for bacterial protein function prediction. It is practical for genome-scale data and does not require pre-existing sequence clustering; thus, this method suits the needs of bacterial genome and metagenome researchers. The method and a web-server are available at http://ekhidna.biocenter.helsinki.fi/poxo/blannotator/ webcite.The rapid progress in sequencing technology has enabled the generation of unimaginable amounts of bacterial genomic data. The genome sequences of thousands of bacteria have been determined, and many more are in progress [1]. In addition, enormous numbers of sequences have been produced in metagenomic studies exploring the genomic contents of microbial communities by sequencing [2]. The interpretation of this data is necessarily based on computational analysis, and only a minority of the predicted protein-coding sequences are experimentally characterised or tested with functional genomics assays. Functional inferences for the large majority of putative proteins therefore requi
|