全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Evolutionary Relationship of Protein Sequences of SARS-CoV-2 and Other Viruses through Chaos Game Representation

DOI: 10.4236/cmb.2022.123008, PP. 123-143

Keywords: Chaos Game Representation (CGR), Protein, Multi-Dimensional Scaling (MDS)

Full-Text   Cite this paper   Add to My Lib

Abstract:

Comparison between different biological sequences is a key step in bioinformatics when analyzing similarities of sequences and phylogenetic relationships. A method of graphically representing biological sequences known as Chaos Game Representation (CGR) has achieved many applications in the studies of bioinformatics. The key issue in the application of CGR is to extract as many useful features as possible from CGR. Initially, CGR was applied to DNA sequences, but in this paper, a CGR-based approach is used to extract suitable features for comparing protein sequences of SARS-CoV-2 and other viruses. For this aim, several viral protein sequences from 12 groups are considered and CGR centroid, amino acid frequency, compounded frequency, Shannon entropy, and Kullback-Lieber Discrimination Information are applied to find the inter-relationship among the sequences. The experimental results demonstrate the potential strengths of CGR-based method for examining the evolutionary relationship of protein sequences. Our method is powerful for extracting effective features from protein sequences, and therefore important in classifying proteins and inferring the phylogeny of viruses.

References

[1]  MedlinePlus. What Are Proteins and What do They Do? U.S. National Library of Medicine.
https://medlineplus.gov/genetics/understanding/howgeneswork/protein
[2]  Rigden, D.J. (2009) From Protein Structure to Function in Bioinformatics. Springer-Verlag, New York.
https://doi.org/10.1007/978-1-4020-9058-5
[3]  Almeida, J.S., Carrico, J.A., Maretzek, A., Noble, P.A. and Fletcher, M. (2001) Analysis of Genomic Sequences by Chaos Game Representation. Bioinformatics, 17, 429-437.
https://doi.org/10.1093/bioinformatics/17.5.429
[4]  Jeffrey, H.J. (1990) Chaos Game Representation of Gene Structure. Nucleic Acids Research, 18, 2163-2170.
https://doi.org/10.1093/nar/18.8.2163
[5]  Olyaee, M.H., Khanteymoori, A. and Khalifeh, K. (2019) Application of Chaotic Laws to Improve Haplotype Assembly Using Chaos Game Representation. Scientific Reports, 9, Article No. 10361.
https://doi.org/10.1038/s41598-019-46844-y
[6]  Olyaee, M.H., Pirgazi, J., Khalifeh, K. and Khanteymoori, A. (2020) RCOVID19: Recurrence-Based SARS-CoV-2 Features Using Chaos Game Representation. Data Brief, 32, Article ID: 106144.
https://doi.org/10.1016/j.dib.2020.106144
[7]  Lchel, H.F. and Heider, D. (2021) Chaos Game Representation and Its Applications in Bioinformatics. Computational and Structural Biotechnology Journal, 19, 6263-6271.
https://doi.org/10.1016/j.csbj.2021.11.008
[8]  Joseph, J. and Sasikumar, R. (2006) Chaos Game Representation for Comparison of Whole Genomes. BMC Bioinformatics, 7, 243-252.
https://doi.org/10.1186/1471-2105-7-243
[9]  Tanchotsrinon, W., Lursinsap, C. and Poovorawan, Y. (2015) A High Performance Prediction of HPV Genotypes by Chaos Game Representation and Singular Value Decomposition. BMC Bioinformatics, 16, Article No. 71.
[10]  Hoang, T., Yin, C.C. and Yau, S.S.-T. (2016) Numerical Encoding of DNA Sequences by Chaos Game Representation with Application in Similarity Comparison. Genomics, 108, 134-142.
https://doi.org/10.1016/j.ygeno.2016.08.002
[11]  Goldman, N. (1993) Nucleotide, Dinucleotide and Trinucleotide Frequencies Explain Patterns Observed in Chaos Game Representations of DNA Sequences. Ucleic Acids Research, 21, 2487-2491.
https://doi.org/10.1093/nar/21.10.2487
[12]  Fiser, A., Tusndy, G.E. and Simon, I. (1994) Chaos Game Representation of Protein Structures. Journal of Molecular Graphics, 12, 302-304.
https://doi.org/10.1016/0263-7855(94)80109-6
[13]  Randic, M., Butina, D. and Zupan, J. (2006) Novel 2-D Graphical Representation of Proteins. Chemical Physics Letters, 419, 528-532.
https://doi.org/10.1016/j.cplett.2005.11.091
[14]  Basu, S., Pan, A., Dutta, C. and Das, J. (1997) Chaos Game Representation of Proteins. Journal of Molecular Graphics & Modelling, 15, 279-289.
https://doi.org/10.1016/S1093-3263(97)00106-X
[15]  Bhoumik, P. and Hughes, A.L. (2018) Chaos Game Representation: An Alignment-Free Technique for Exploring Evolutionary Relationships of Protein Sequences.
https://doi.org/10.1101/276915
[16]  Yu, Z.G., Anh, V. and Lau, K.S. (2004) Chaos Game Representation of Protein Sequences Based on the Detailed HP Model and Their Multifractal and Correlation Analyses. Journal of Theoretical Biology, 226, 341-348.
https://doi.org/10.1016/j.jtbi.2003.09.009
[17]  Qi, Z., Li, K., Ma, J., Yao, Y. and Liu, L. (2018) Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application. Evolutionary Bioinformatics, 14, 1-8.
https://doi.org/10.1177/1176934318777755
[18]  Mu, Z., Yu, T., Qi, E., et al. (2019) DCGR: Feature Extractions from Protein Sequences Based on CGR via Remodeling Multiple Information. BMC Bioinformatics, 20, Article No. 351.
https://doi.org/10.1186/s12859-019-2943-x
[19]  Mehri, M., Fatemeh, A. and Vahid, Z. (2018) A Novel Graphical Representation and Similarity Analysis of Protein Sequences Based on Physiochemical Properties. Physica A, 510, 477-485.
https://doi.org/10.1016/j.physa.2018.07.011
[20]  Li, N., Shi, F., Niu, X. and Xia, J. (2009) A Novel Method to Reconstruct Phylogeny Tree Based on the Chaos Game Representation. Journal of Biomedical Science and Engineering, 2, 582-586.
https://doi.org/10.4236/jbise.2009.28084
[21]  Sun, Z., Pei, S., He, R.J. and Yau, S. (2020) A Novel Numerical Representation for Proteins: Three-Dimensional Chaos Game Representation and Its Extended Natural Vector. Computational and Structural Biotechnology Journal, 18, 1904-1913.
https://doi.org/10.1016/j.csbj.2020.07.004
[22]  Yu, L., et al. (2017) Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix. Scientific Reports, 7, Article No. 46237.
https://doi.org/10.1038/srep46237
[23]  Hannah, F., Lchel, D.E., Sperlea, T. and Heider, D. (2020) Deep Learning on Chaos Game Representation for Proteins. Bioinformatics, 36, 272-279.
https://doi.org/10.1093/bioinformatics/btz493
[24]  Sengupta, D.C., Hill, M.D., Benton, K.R. and Banerjee, H.N. (2020) Similarity Studies of Corona Viruses through Chaos Game Representation. Computational Molecular Bioscience, 10, 61-72.
https://doi.org/10.4236/cmb.2020.103004
[25]  Kruskal, J. (1964) Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika, 29, 1-27.
https://doi.org/10.1007/BF02289565
[26]  Karamichalis, R., Kari, L., Konstantinidis, S., et al. (2015) An Investigation into Inter and Intra-Genomic Variations of Graphic Genomic Signatures. BMC Bioinformatics, 16, Article No. 246.
https://doi.org/10.1186/s12859-015-0655-4
[27]  Randhawa, G.S., Soltysiak, M.P.M., El Roz, H., de Souza, C.P.E., Hill, K.A. and Kari, L. (2020) Machine Learning Using Intrinsic Genomic Signatures for Rapid Classification of Novel Pathogens: COVID-19 Case Study. PLOS ONE, 15, e0232391.
https://doi.org/10.1371/journal.pone.0232391

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133