|
BMC Bioinformatics 2005
Evaluating eukaryotic secreted protein predictionAbstract: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90–91% accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent.Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.Predicting secreted proteins from primary sequence is a major component of automated protein annotation and is critical to a wide range of studies. Embryology, tumor maker detection, and agricultural animal performance are investigated using eukaryotic secreted proteins and their role in cell-to-cell communication, cellular differentiation, morphological development, and cellular response to disease. Many software tools have been developed for ab initio cellular localization prediction, using machine learning techniques such as neural networks, hidden Markov models and support vector machines. Identifying the program best suited for a researcher's needs requires familiarity with several different programs. Prediction accuracy depends on the methods employed by a program and the integrity of the data used to develop the program. Additionally, unbiased comparison using an independent protein sequence set is needed to compare programs, as system characteristics reported by program authors are often inflated [1].The ambiguity of terminology used to describe and label secreted proteins often results in confusion on just what type of protein is being predicted or discussed. To eliminate this confusion, biologically concrete labels will be used in lieu of the term "secreted protein" or "secretory protein", here. Proteins possessing an N-terminal signal sequenc
|