immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the
health of human being. Human immunodeficiency virus (HIV) is the pathogeny for
this disease. Investigating HIV-1 protease cleavage sites can help researchers
find or develop protease inhibitors which can restrain the replication of HIV-1,
thus resisting AIDS. Feature selection is a new approach for solving the HIV-1
protease cleavage site prediction task and it’s a key point in our research.
Comparing with the previous work, there are several advantages in our work.
First, a filter method is used to eliminate the redundant features. Second,
besides traditional orthogonal encoding (OE), two kinds of newly proposed
features extracted by conducting principal component analysis (PCA) and
non-linear Fisher transformation (NLF) on AAindex database are used. The two
new features are proven to perform better
than OE. Third, the data set used here is largely expanded to 1922 samples.
Also to improve prediction performance, we conduct parameter optimization for
SVM, thus the classifier can obtain better prediction capability. We also fuse
the three kinds of features to make sure comprehensive feature representation
and improve prediction performance. To effectively evaluate the prediction
performance of our method, five parameters, which are much more than previous
work, are used to conduct complete comparison. The experimental results of our
method show that our method gain better performance than the state of art
method. This means that the feature selection combined with feature fusion and
classifier parameter optimization can effectively improve HIV-1 cleavage site
prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor
developing in the future.