|
计算机应用研究 2011
Protein secondary structure co-training prediction method
|
Abstract:
Machine learning based protein secondary structure prediction methods suffer low prediction accuracy because they ignore the amino acid hydrophobic property and the interaction between far away amino acids. A sequence of hydrophobic value can be build by replacing the amino acid by its hydrophobic value. Experiments show that the BP neural network using long amino hydrophobic value sequence works well in prediction of E structure which is controlled mainly by long amino acid-amino acid interaction. Because both the Profile space and the hydrophobic energy value space are sufficient and redundant views, this paper proposes a Co-training algorithm. In the proposed algorithm, there are two classifiers. One is SVM classifier trained in Profile space, and the other is BP neural network classifier trained in hydrophobic value space, and they predict one amino acid secondary structure independently. If these two classifiers have different prediction results with one amino acid, an arbitration rule proposed in this paper is employed to make the final decision which is based on an active selecting strategy. Suspected sample and creditable sample are defined according to the characteristics of the classifiers and spaces to arbitrate the controversial prediction results. The experimental results show that the proposed algorithm has higher prediction accuracy both in E structure which controlled mainly by long interaction and H structure which controlled mainly by short interaction than existing algorithms.