OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

BMC Bioinformatics 2005

Automatic detection of false annotations via binary property clustering

DOI: 10.1186/1471-2105-6-46

Noam Kaplan, Michal Linial

Full-Text Cite this paper Add to My Lib

Abstract:

Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful.Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods.Computational protein annotation is a major goal of bioinformatics and annotation methods are widely used. A wide variety of annotation methods exist, many of which rely on some kind of scoring. Typically, when testing whether a protein should be given a certain annotation, a score threshold is set, and proteins that score higher than the threshold are given the annotation. Obviously, some annotation mistakes may occur. Such mistakes can be divided into false positives (FPs) and false negatives (FNs). FPs (or false hits) are annotations that were mistakenly assigned to a protein (type I error). FNs (or misses) are annotations that should have been assigned to a protein but were not (type II error). Adjustment of score thresholds allows tradeoff between these two types of mistakes. FPs annotations are considered to be of graver consequence than FNs. This is partially due to the fact that introduction of a false positive annotation into a protein database may cause other proteins to become incorrectly annotated on the basis of sequence similarity [1,2]. A systematic evaluation of the source of false annotations that already contaminated current databases

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133