Since a decade ago, both protein and amino acid features have been correlated with crystallization propensity of proteins in order to develop methods to predict whether a protein can be crystallized. In this continuing study, each of three features combining features of amino acid and protein, was correlated with the crystallization propensity of proteins from Mycobacterium tuberculosis using logistic and neural network models. The results showed that two combined features, amino acid distribution probability and future composition, had good predictions on whether a protein would be crystallized in comparison with the predictions obtained from each of 531 amino acid features. The results obtained from the third combined feature, amino acid pair predictability, demonstrated the trend of crystallization propensity in proteins from Mycobacterium tuberculosis.
References
[1]
Kurgan, L. and Mizianty, M.J. (2009) Sequence-Based Protein Crystallization Propensity Prediction for Structural Genomics: Review and Comparative Analysis. Natural Science, 1, 93-106.
https://doi.org/10.4236/ns.2009.12012
[2]
Canaves, J.M., Page, R., Wilson, I.A. and Stevens, R.C. (2004) Protein Biophysical Properties That Correlate with Crystallization Success in Thermotoga Maritima: Maximum Clustering Strategy for Structural Genomics. Journal Molecular Biology, 344, 977-991. https://doi.org/10.1016/j.jmb.2004.09.076
[3]
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T. and Kanehisa, M. (2008) AAindex: Amino Acid Index Database, Progress Report 2008. Nucleic Acids Research, 36, D202-D205.
https://doi.org/10.1093/nar/gkm998
[4]
Overton, I.M., Padovani, G., Girolami, M.A. and Barton, G.J. (2008) ParCrys: A Parzen Window Density Estimation Approach to Protein Crystallization Propensity Prediction. Bioinformatics, 24, 901-907.
https://doi.org/10.1093/bioinformatics/btn055
[5]
Chen, L., Oughtred, R., Berman, H.M. and Westbrook, J. (2004) TargetDB: A Target Registration Database for Structural Genomics Projects. Bioinformatics, 20, 2860-2862. https://doi.org/10.1093/bioinformatics/bth300
[6]
Berman, H.M., Gabanyi, M.J., Kouranov, A., Micallef, D.I., Westbrook, J. and Protein Structure Initiative Network of Investigators. (2017) Protein Structure Initiative—TargetTrack 2000-2017—All Data Files (Data Set). Zenodo.
[7]
Slabinski, L., Jaroszewski, L., Rodrigues, A.P.C., Rychlewski, L., Wilson, I.A., Lesley, S.A. and Godzik, A. (2007) The Challenge of Protein Structure Determination—Lessons from Structural Genomics. Protein Science, 16, 2472-2482. https://doi.org/10.1110/ps.073037907
[8]
Slabinski, L., Jaroszewski, L., Rychlewski, L., Wilson, I.A., Lesley, S.A. and Godzik, A. (2007) XtalPred: A Web Server for Prediction of Protein Crystallizability. Bioinformatics, 23, 3403-3405.
https://doi.org/10.1093/bioinformatics/btm477
[9]
Yan, S. and Wu, G. (2011) Possible Random Mechanism in Crystallization Evidenced in Proteins from Plasmodium falciparum. Crystal Growth & Design, 11, 4198-4204. https://doi.org/10.1021/cg200814k
[10]
Yan, S. and Wu, G. (2012) Randomness in Crystallization of Proteins from Staphylococcus aureus. Protein & Peptides Letters, 19, 784-789. https://doi.org/10.2174/092986612800793190
[11]
Yan, S. and Wu, G. (2012) Correlating Dynamic Amino Acid Properties with Success Rate of Crystallization of Proteins from Bacteroides vulgatus. Crystal Research and Technology, 47, 511-516.
https://doi.org/10.1002/crat.201200007
[12]
Yan, S. and Wu, G. (2013) Association of Combined Features of Amino Acid and Protein with Crystallization Propensity of Proteins from Cytophaga Hutchinsonii. Zeitschrift fur Kristallographie, 228, 250-254.
https://doi.org/10.1524/zkri.2013.1570
[13]
Yan, S.M., Wang, H.J. and Wu, G. (2013) Correlation of Combined Features of Amino Acid and Protein with Crystallization Propensity of Proteins from Caenorhabditis elegans. Guangxi Sciences, 20, 234-243.
[14]
Yan, S. and Wu, G. (2015) Predicting Crystallization Propensity of Proteins from Arabidopsis thaliana. Biological Procedures Online, 17, 16. https://doi.org/10.1186/s12575-015-0029-3
[15]
Yan, S. and Wu, G. (2019) Correlation of Combined Characters of Amino Acid and Whole Protein with Success Rate of Crystallization of Lactobacillus Proteins. Journal of Biomedical Science and Engineering, 12, 245-256.
https://doi.org/10.4236/jbise.2019.124017
[16]
Wu, G. and Yan, S. (2008) Lecture Notes on Computational Mutation. Nova Science Publishers, New York.
[17]
Feller, W. (1968) An Introduction to Probability Theory and Its Applications, 3rd Edition, Volume, 1, Wiley, New York.
[18]
Darby, N.J. and Creighton, T.E. (1993) Dissecting the Disulphide-Coupled Folding Pathway of Bovine Pancreatic Trypsin Inhibitor. Forming the First Disulphide Bonds in Analogues of the Reduced Protein. Journal Molecular Biology, 232, 873-896. https://doi.org/10.1006/jmbi.1993.1437
[19]
Dwyer, D.S. (2005) Electronic Properties of Amino Acid Side Chains: Quantum Mechanics Calculation of Substituent Effects. BMC Chemical Biology, 5, 2. https://doi.org/10.1186/1472-6769-5-2
[20]
Cooper, G.M. (2004) The Cell: A Molecular Approach. ASM Press, Washington DC, 51.
[21]
Chou, P.Y. and Fasman, G.D. (1978) Prediction of Secondary Structure of Proteins from Amino Acid Sequence. Advances in Enzymology and Related Subjects of Biochemistry, 47, 45-148.
https://doi.org/10.1002/9780470122921.ch2
[22]
Smialowski, P., Schmidt, T., Cox, J., Kirschner, A. and Frishman, D. (2006) Will My Protein Crystallize? A Sequence-Based Predictor. Proteins, 62, 343-355. https://doi.org/10.1002/prot.20789
[23]
Demuth, H. and Beale, M. (2001) Neural Network Toolbox for Use with MatLab. User’s Guide, Version 4.
[24]
MathWorks Inc (1984-2001) MatLab—The Language of Technical Computing (Version 6.1.0.450, Release 12.1).
[25]
Shaw, P.A., Pepe, M.S., Alonzo, T.A. and Etzioni, R. (2009) Methods for Assessing Improvement in Specificity when a Biomarker is Combined with a Standard Screening Test. Statistics in Biopharmaceutical Research, 1, 18-25. https://doi.org/10.1198/sbr.2009.0002
[26]
Pepe, M., Longton, G. and Janes, H. (2009) Estimation and Comparison of Receiver Operating Characteristic Curves. The Stata Journal: Promoting Communications on Statistics and Stata, 9, 1-16.
https://doi.org/10.1177/1536867X0900900101
[27]
Cai, T.X., Pepe, M.S., Zheng, Y.Y., Lumley, T., and Jenny, N.S. (2006) The Sensitivity and Specificity of Markers for Event Times. Biostatistics, 7, 182-197. https://doi.org/10.1093/biostatistics/kxi047
[28]
Alonzo, T., and Pepe, M.S. (2002) Distribution-Free ROC Analysis Using Binary Regression Techniques. Biostatistics, 3, 421-432. https://doi.org/10.1093/biostatistics/3.3.421
[29]
Atchley, W.R., Zhao, J., Fernandes, A.D. and Druke, T. (2005) Solving the Protein Sequence Metric Problem. Proceedings of the National Academy of Sciences of the United States of America, 102, 6395-6400.
https://doi.org/10.1073/pnas.0408677102
[30]
Chou, K.C. (2011) Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. Journal of Theoretical Biology, 273, 236-247. https://doi.org/10.1016/j.jtbi.2010.12.024
[31]
Overton, I.M. and Barton, G.J. (2006) A Normalised Scale for Structural Genomics Target Ranking: the OB-Score. FEBS Letters, 580, 4005-4009. https://doi.org/10.1016/j.febslet.2006.06.015
[32]
Chen, K., Kurgan, L. and Rahbari, M. (2007) Prediction of Protein Crystallization Using Collocation of Amino Acid Pairs. Biochemical and Biophysical Research Communications, 355, 764-769.
https://doi.org/10.1016/j.bbrc.2007.02.040
[33]
Kurgan, L., Razib, A.A., Aghakhani, S., Dick, S., Mizianty, M.J. and Jahandideh, S. (2009) CRYSTALP2: Sequence-Based Protein Crystallization Propensity Prediction. BMC Structural Biology, 9, 50.
https://doi.org/10.1186/1472-6807-9-50
[34]
Varga, J.K. and Tusnády, G.E. (2018) TMCrys: Predict Propensity of Success for Transmembrane Protein Crystallization. Bioinformatics, 34, 3126-3130. https://doi.org/10.1093/bioinformatics/bty342
[35]
Elbasir, A., Moovarkumudalvan, B., Kunji, K., Kolatkar, P.R., Mall, R. and Bensmail, H. (2019) DeepCrystal: A Deep Learning Framework for Sequence-Based Protein Crystallization Prediction. Bioinformatics, 35, 2216-2225. https://doi.org/10.1093/bioinformatics/bty953
[36]
Meng, F., Wang, C. and Kurgan, L. (2018) fDETECT Webserver: Fast Predictor of Propensity for Protein Production, Purification, and Crystallization. BMC Bioinformatics, 18, 580.
https://doi.org/10.1186/s12859-017-1995-z
[37]
Derewenda, Z.S. and Godzik, A. (2017) The “Sticky Patch” Model of Crystallization and Modification of Proteins for Enhanced Crystallizability. In: Wlodawer, A., Dauter, Z. and Jaskolski, M., Eds., Protein Crystallography. Methods in Molecular Biology, Humana Press, New York, 77-115.
https://doi.org/10.1007/978-1-4939-7000-1_4
[38]
Wang, H., Feng, L., Webb, G.I., Kurgan, L., Song, J. and Lin, D. (2018) Critical Evaluation of Bioinformatics Tools for the Prediction of Protein Crystallization Propensity. Briefings in Bioinformatics, 19, 838-852.
https://doi.org/10.1093/bib/bbx018
[39]
Wang, H., Feng, L., Zhang, Z., Webb, G.I., Lin, D. and Song, J. (2016) Crysalis: An Integrated Server for Computational Analysis and Design of Protein Crystallization. Scientific Reports, 6, 21383.
https://doi.org/10.1038/srep21383