全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

DNA结合蛋白特征提取算法综述
An Overview of DNA-Binding Protein for Feature Extraction Algorithms

DOI: 10.12677/HJCB.2020.102003, PP. 21-30

Keywords: DNA结合蛋白,特征提取,序列信息,结构信息
DNA-Binding Protein
, Feature Extraction, Sequence Information, Structure Information

Full-Text   Cite this paper   Add to My Lib

Abstract:

DNA结合蛋白的识别与预测对于研究生物体的生命活动,理解生命活动内在机理具有十分重要的作用。随着蛋白质序列数目的快速增加,计算方法比传统实验方法具有更大的优势。本文从蛋白质的序列信息和结构信息入手,对目前DNA结合蛋白特征提取方法进行归纳总结。在PDB1075和PDB186数据集上,利用XGBoost算法对9种蛋白质序列特征提取方法进行对比分析。结果显示,不同的特征提取方法具有各自的优势与不足,其中,基于蛋白质序列进化信息的Local_DPP方法综合表现最好。
The recognition and prediction for DNA-binding proteins play a very important role in studying and understanding the internal mechanisms life activities. The huge numbers of protein sequences have been produced. Computational method has greater advantages than traditional experimental methods. In this paper, we summary the existed methods of DNA-binding protein for feature ex-traction based on the sequence information and structural information of the protein. The XGBoost algorithm is employed to compare and analyze the nine feature extraction methods of protein se-quence on the PDB1075 and PDB186 datasets. The results demonstrate that different feature ex-traction methods have their own advantages and disadvantages. Among them, the Local_DPP method based on the evolution information of protein sequences has the best comprehensive pre-diction performance.

References

[1]  Kumar, M., Gromiha, M.M. and Raghava, G.P.S. (2007) Identification of DNA-Binding Proteins Using Support Vector Machines and Evolutionary Profiles. BMC Bioinformatics, 8, Article No. 463.
https://doi.org/10.1186/1471-2105-8-463
[2]  汤希玮. 蛋白质复合物识别算法综述[J]. 长沙大学学报, 2017, 31(5): 19-23.
[3]  张军. 基于序列信息的DNA/RNA结合蛋白识别[D]: [硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2018.
[4]  Kurgan, L.A., Cios, K.J. and Chen, K. (2008) SCPRED: Accurate Prediction of Protein Structural Class for Sequences of Twilight-Zone Similarity with Predicting Sequences. BMC Bioinformatics, 9, Article No. 226.
https://doi.org/10.1186/1471-2105-9-226
[5]  Yang, J.-Y., Peng, Z.-L. and Chen, X. (2010) Prediction of Protein Structural Classes for Low-Homology Sequences Based on Predicted Secondary Structure. BMC Bioinformatics, 11, Ar-ticle No. S9.
https://doi.org/10.1186/1471-2105-11-S1-S9
[6]  Dai, Q., Li, Y., Liu, X., Yao, Y., Cao, Y. and He, P. (2013) Comparison Study on Statistical Features of Predicted Secondary Structures for Protein Structural Class Prediction: From Content to Position. BMC Bioinformatics, 14, Article No. 152.
https://doi.org/10.1186/1471-2105-14-152
[7]  Szilágyi, A. and Skolnick, J. (2006) Efficient Prediction of Nucleic Acid Binding Function from Low-Resolution Protein Structures. Journal of Molecular Biology, 358, 922-933.
https://doi.org/10.1016/j.jmb.2006.02.053
[8]  Stawiski, E.W., Gregoret, L.M. and Mandel-Gutfreund, Y. (2003) Annotating Nucleic Acid-Binding Function Based on Protein Structure. Journal of Molecular Biology, 326, 1065-1079.
https://doi.org/10.1016/S0022-2836(03)00031-7
[9]  Ahmad, S. and Sarai, A. (2004) Moment-Based Prediction of DNA-Binding Proteins. Journal of Molecular Biology, 341, 65-71.
https://doi.org/10.1016/j.jmb.2004.05.058
[10]  Shanahan, H.P., Garcia, M.A., Jones, S. and Thornton, J.M. (2004) Identifying DNA-Binding Proteins Using Structural Motifs and the Electrostatic Potential. Nucleic Acids Research, 32, 4732-4741.
https://doi.org/10.1093/nar/gkh803
[11]  Gao, M. and Skolnick, J. (2008) DBD-Hunter: A Knowledge-Based Method for the Prediction of DNA-Protein Interactions. Nucleic Acids Research, 36, 3978-3992.
https://doi.org/10.1093/nar/gkn332
[12]  Gao, M. and Skolnick, J. (2009) A Threading-Based Method for the Pre-diction of DNA-Binding Proteins with Application to the Human Genome. PLoS Computational Biology, 5, e1000567.
https://doi.org/10.1371/journal.pcbi.1000567
[13]  Zhao, H., Yang, Y. and Zhou, Y. (2010) Structure-Based Pre-diction of DNA-Binding Proteins by Structural Alignment and a Volume-Fraction Corrected DFIRE-Based Energy Function. Bioinformatics, 26, 1857-1863.
https://doi.org/10.1093/bioinformatics/btq295
[14]  Zhang, Y., Xu, J., Zheng, W., Zhang, C., Qiu, X., Chen, K. and Ruan, J. (2014) newDNA-Prot: Prediction of DNA-Binding Proteins by Employing Support Vector Machine and a Comprehensive Sequence Representation. Computational Biology and Chemistry, 52, 51-59.
https://doi.org/10.1016/j.compbiolchem.2014.09.002
[15]  Chou, K.-C. (2001) Prediction of Protein Cellular Attrib-utes Using Pseudo-Amino Acid Composition. Proteins: Structure, Function, and Bioinformatics, 43, 246-255.
https://doi.org/10.1002/prot.1035
[16]  Zhang, P., et al. (2016) A Protein Network Descriptor Server and Its Use in Studying Protein, Disease, Metabolic and Drug Targeted Networks. Briefings in Bioinformatics, 18, 1057-1070.
[17]  Feng, Z.-P. and Zhang, C.-T. (2000) Prediction of Membrane Protein Types Based on the Hydropho-bic Index of Amino Acids. Journal of Protein Chemistry, 19, 269-275.
https://doi.org/10.1023/A:1007091128394
[18]  Wang, Y., Ding, Y., Guo, F., Wei, L. and Tang, J. (2017) Im-proved Detection of DNA-Binding Proteins via Compression Technology on PSSM Information. PLoS ONE, 12, e0185587.
https://doi.org/10.1371/journal.pone.0185587
[19]  Chou, K.-C. and Shen, H.-B. (2007) MemType-2L: A Web Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM. Biochemical and Biophysical Research Communications, 360, 339-345.
https://doi.org/10.1016/j.bbrc.2007.06.027
[20]  Wei, L., Tang, J. and Zou, Q. (2017) Local-DPP: An Improved DNA-Binding Protein Prediction Method by Exploring Local Evolutionary Information. Information Sciences, 384, 135-144.
https://doi.org/10.1016/j.ins.2016.06.026
[21]  Wang, C., Fang, Y., Xiao, J. and Li, M. (2011) Identifica-tion of RNA-Binding Sites in Proteins by Integrating Various Sequence Information. Amino Acids, 40, 239-248.
https://doi.org/10.1007/s00726-010-0639-7
[22]  Liu, B., Xu, J., Lan, X., Xu, R., Zhou, J., Wang, X. and Chou, K.-C. (2014) iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Re-duced Alphabet Profile into the General Pseudo Amino Acid Composition. PLoS ONE, 9, e106691.
https://doi.org/10.1371/journal.pone.0106691
[23]  Lou, W., Wang, X., Chen, F., Chen, Y., Jiang, B. and Zhang, H. (2014) Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Na?ve Bayes. PLoS ONE, 9, e86703.
https://doi.org/10.1371/journal.pone.0086703
[24]  Zou, Y., Ding, Y., Tang, J., Guo, F. and Peng, L. (2019) FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Se-quence Features via Chou’s Five-Step Rule. International Journal of Molecular Sciences, 20, 4175.
https://doi.org/10.3390/ijms20174175

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133