全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2013 

A Consistency-Based Feature Selection Method Allied with Linear SVMs for HIV-1 Protease Cleavage Site Prediction

DOI: 10.1371/journal.pone.0063145

Full-Text   Cite this paper   Add to My Lib

Abstract:

Background Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy. Results We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations. Conclusion Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper. Software availability The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar. The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_s?ite_prediction.rar; you will find a readme file which explains how to set the software in order to work.

References

[1]  UNAIDS website. Available: http://www.unaids.org. Accessed 2013 May 13
[2]  Ogul H (2009) Variable context Markov chains for HIV protease cleavage site prediction. Bio Systems 96: 246–250.
[3]  Nanni L, Lumini A (2006) A reliable method for HIV-1 protease cleavage site prediction. Neurocomputing 69: 838–841.
[4]  Kim H, Zhang Y, Heo YS, Oh HB, Chen SS (2008) Specificity rule discovery in HIV-1 protease cleavage site analysis. Computational Biology and Chemistry 32: 71–78.
[5]  Lumini A, Nanni L (2006) Machine learning for HIV-1 protease cleavage site prediction. Pattern Recognition Letters 27: 1537–1544.
[6]  Cai YD, Yu H, Chou KC (1998) Artificial neural network method for predicting HIV protease cleavage sites in protein. Journal of Protein Chemistry 17: 607–615.
[7]  Yang ZR, Thomson R (2005) Bio-basis function neural network for prediction of protease cleavage sites in proteins,. IEEE Transactions on Neural Networks 16: 263–274.
[8]  Kim G, Kim Y, Kim H (2008) Feature Selection using Multi-Layer Perceptron in HIV-1 Protease Cleavage Data. Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics.
[9]  Thompson TB, Chou KC, Zheng C (1995) Neural network prediction of the HIV-1 protease cleavage sites. Journal of Theoretical Biology 177: 369–379.
[10]  Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support Vector Machines for predicting HIV protease cleavage sites in protein. Journal of Computational Chemistry 23: 267–274.
[11]  Jayavardhana RGL, Palaniswami M (2005) Cleavage knowledge extraction in HIV-1 protease using hidden Markov model. Proceedings of the International Conference on Intelligent Sensing and Information Processing 469–473.
[12]  Kim G, Kim Y, Lim H, Kim H (2010) An MLP-based feature subset selection for HIV-1 protease cleavage site analysis,. Artificial Intelligence in Medicine 48: 83–89.
[13]  Loris N (2006) Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognition 39 (4)
[14]  Niu B, Lu L, Liu L, Gu TH, Feng KY, et al. (2009) HIV-1 protease cleavage site prediction based on amino acid property. Journal of Computational Chemistry 30: 33–39.
[15]  Chou KC (1996) Review: prediction of HIV protease cleavage sites in proteins. Anal Biochem 233 (1)
[16]  You L, Garwicz D, R?gnvaldsson T (2005) Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease. Journal of Virology 79: 12477–12486.
[17]  Saeys Y, Inza I, Larra?aga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23: 2507–17.
[18]  Liu H, Motoda H, Setiono R, Zhao Z (2010) Feature Selection: An Ever Evolving Frontier in Data Mining. Knowledge Creation Diffusion Utilization 4–13.
[19]  Guyon I, Elisseeff A (2003) An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3: 1157–1182.
[20]  Mitchell TM (1997) Machine Learning. Annual Review Of Computer Science 4: 255–306.
[21]  Jaeger S, Chen SS (2010) Information Fusion for Biological Prediction. Journal of Data Science 8.
[22]  R?gnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20: 1702–1709.
[23]  Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46: 389–422.
[24]  Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning 20: 273–297.
[25]  Noble WS (2006) What is a support vector machine?. Nature Biotechnology 24: 1565–1567.
[26]  Tan JY (2009) A Novel SVM-RFE for Gene Selection. ICOSB 237–244.
[27]  Yu Y (2008) SVM-RFE Algorithm for Gene Feature Selection. Computer Engineering
[28]  Narayanan A, Wu X, Yang ZR (2002) Mining viral protease data to extract cleavage knowledge. Bioinformatics 18 (1) S5–S13.
[29]  De Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, et al. (2005) An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21: 3797–3800.
[30]  Yang ZR, Chou KC (2004) Bio-support vector machines for computational proteomics. Bioinformatics 20: 735–741.
[31]  Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009) The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (1)
[32]  Nanni L, Lumini A (2010) A new encoding technique for peptide classification. Expert Systems with Applications 38: 3185–3191.
[33]  Dash M (2003) Consistency-based search in feature selection. Artificial Intelligence 151: 155–176.
[34]  Dash M, Liu H (1997) Feature selection for classification. Intelligent Data Analysis 1: 131–156.
[35]  Liu H, Setiono R (1996) A probabilistic approach to feature selection - A filter solution. Proceedings of the International Conference on Machine Learning 319–327.
[36]  Machine SV (2001) Sequential Minimal Optimization for SVM ReCALL.
[37]  Calvo B, Larranaga P, Lozano JA (2009) Feature subset selection from positive and unlabelled examples. Pattern Recognition Letters 30: 1027–1036.
[38]  Zhao Y, Pinilla C, Valmori D, Martin R, Simon R (2003) Application of support vector machines for T-cell epitopes prediction. Bioinformatics 19: 1978–1984.
[39]  Peng HC, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) 1226–1238.
[40]  Chou KC, Tomasselli AG, Reardon IM, Heinrikson RL (1996) Predicting human immunodeficiency virus protease cleavage sites in proteins by a discriminant function method. Proteins 24: 51–72.
[41]  Liu H, Setiono R (1996) A Probabilistic Approach to Feature Selection: A Filter Solution. 13th International Conference on Machine Learning Bari Italy 319–327.
[42]  Glaab E, Bacardit J, Garibaldi JM, Krasnogor N (2012) Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data. PLoS ONE 7 (7) e39932.
[43]  Wood I, Visscher P, Mengersen K (2007) Classification based upon gene expression data: bias, precision of error rates. Bioinformatics 23: 1363–1370.
[44]  Garcia S, Fernandez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180 (10) 2044–2064.
[45]  Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation 3–18.
[46]  Saeys Y, Inza I, Larra?aga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23 (19) 2507–2517.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133