|
- 2017
词袋模型在蛋白质亚细胞定位预测中的应用DOI: 10.3969/j.issn.1673-1689.2017.03.011 Keywords: 词袋模型 K-means 支持向量机 亚细胞定位预测bag of words model,K-means,support vector machine,subcellular localization prediction Abstract: 运用词袋模型结合传统的蛋白质特征提取算法提取蛋白质序列特征,采用K-means算法构建字典,计算获得蛋白质序列的词袋特征,最终将提取的特征值送入SVM多类分类器,对数据集中蛋白质的亚细胞位置进行预测,在一定程度上提高了亚细胞定位预测的准确率。Predecessors have done a lot of work in the feature extraction of protein and subcellular localization prediction. Previous studies showed that prediction accuracy obtained by traditional feature extraction algorithm is low. In order to improve accuracy,bag of words model combined with traditional protein features extraction algorithm is used to extract feature of protein sequence in this study. Firstly,K-means algorithm is used to construct feature dictionary. Then bag of words features of protein sequences are counted by dictionary.Finally extracted feature is inputted into SVM classifier to forecast the protein subcellular location. Results showed that predictionaccuracy of subcellular localization has been improved
|