全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Condensed Matrix Descriptor for Protein Sequence Comparison

DOI: 10.4236/ijamsc.2016.41001, PP. 1-13

Keywords: Amino Acids, Condensed Matrix, Eigen Values, Matrix Invariants, ALE Index

Full-Text   Cite this paper   Add to My Lib

Abstract:

The present paper develops a novel way of reducing a protein sequence of any length to a real symmetric condensed 20 × 20 matrix. This condensed matrix can be nicely applied as a protein sequence descriptor. In fact, with such a condensed representation, comparison of two protein sequences is reduced to a comparison of two such 20 × 20 matrices. As each square matrix has a unique Alley Index/normalized Alley Index, such index is conveniently used in getting distance matrix to construct Phylogenetic trees of different protein sequences. Finally protein sequence comparison is made based on these Phylogenetic trees. In this paper three types viz., NADH dehydrogenase subunit 3 (ND3), subunit 4 (ND4) and subunit 5 (ND5) of protein sequences of nine species, Human, Gorilla, Common Chimpanzee, Pygmy Chimpanzee, Fin Whale, Blue Whale, Rat, Mouse and Opossum are used for comparison.

References

[1]  Mount, D.M. (2004) Bioinformatics: Sequence and Genome Analysis. 2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA.
[2]  Needleman, S.B. and Wunsch, C.D. (1970) A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology, 48, 443-453.
http://dx.doi.org/10.1016/0022-2836(70)90057-4
[3]  Gotoh, O. (1982) An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology, 162, 705-708. http://dx.doi.org/10.1016/0022-2836(82)90398-9
[4]  Smith, T.F. and Waterman, M.S. (1981) Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147, 195-197. http://dx.doi.org/10.1016/0022-2836(81)90087-5
[5]  Bucka-Lassen, K., Caprani, O. and Hein, J. (1999) Combining Many Multiple Alignments in One Improved Alignment. Bioinformatics, 15, 122-130. http://dx.doi.org/10.1093/bioinformatics/15.2.122
[6]  Wang, L. and Jiang, T. (1994) On the Complexity of Multiple Sequence Alignment. Journal of Computational Biology, 1, 337-348. http://dx.doi.org/10.1089/cmb.1994.1.337
[7]  Shyu, C., Sheneman, L. and Foster, J.A. (2004) Multiple Sequence Alignment with Evolutionary Computation. Genetic Programming and Evolvable Machines, 5, 121-144.
http://dx.doi.org/10.1023/B:GENP.0000023684.05565.78
[8]  Higgins, D.G. and Sharp, P.M. (1988) CLUSTAL: A Package for Performing Multiple Sequence Alignment on a Micro-Computer. Gene, 73, 237-244. http://dx.doi.org/10.1016/0378-1119(88)90330-7
[9]  Edgar, R.C. (2004) MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Research, 32, 1792-1797. http://dx.doi.org/10.1093/nar/gkh340
[10]  Katoh, K., Misawa, K., Kuma, K.-I. and Miyata, T. (2002) MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids Research, 30, 3059-3066. http://dx.doi.org/10.1093/nar/gkf436
[11]  Notredame, C., Higgins, D.G. and Heringa, J. (2000) T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology, 302, 205-217.
http://dx.doi.org/10.1006/jmbi.2000.4042
[12]  Pham, T.D. and Zuegg, J. (2004) A Probabilistic Measure for Alignment-Free Sequence Comparison. Bioinformatics, 20, 3455-3461. http://dx.doi.org/10.1093/bioinformatics/bth426
[13]  Reinert, G., Chew, D., Sun, F. and Waterman, M.S. (2009) Alignment-Free Sequence Comparison (I): Statistics and Power. Journal of Computational Biology, 16, 1615-1634.
http://dx.doi.org/10.1089/cmb.2009.0198
[14]  Vinga, S. and Almeida, J. (2003) Alignment-Free Sequence Comparison—A Review. Bioinformatics, 19, 513-523. http://dx.doi.org/10.1093/bioinformatics/btg005
[15]  Nandy, A., Harle, M. and Basak, S.C. (2006) Mathematical Descriptors of DNA Sequences: Development and Applications. ARKIVOC, (ix) 211-238.
[16]  Luo, J., Guo, J. and Li, Y. (2010) A New Graphical Representation and Its Application in Similarity/Dissimilarity Analysis of DNA Sequences. 4th International Conference on Bioinformatics and Biomedical Engineering, Chengdu, 18-20 June 2010, 1-5. http://dx.doi.org/10.1109/icbbe.2010.5515203
[17]  Li, C., Xing, L.L. and Wang, X. (2008) 2-D Graphical Representation of Protein Sequences and Its Application to Coronavirus Phylogeny. BMB Reports, 41, 217-222.
http://dx.doi.org/10.5483/BMBRep.2008.41.3.217
[18]  Randic, M., Vracko, M., Novic, M. and Plavsic, D. (2009) Spectral Representation of Reduced Protein Models. SAR and QSAR in Environmental Research, 20, 415-427.
http://dx.doi.org/10.1080/10629360903278685
[19]  Randic, M., Mehulic, K., Vukicevic, D., Pisanski, T., Vikic-Topic, D. and Plavsic, D. (2009) Graphical Representation of Proteins as Four-Color Maps and Their Numerical Characterization. Journal of Molecular Graphics and Modelling, 27, 637-641. http://dx.doi.org/10.1016/j.jmgm.2008.10.004
[20]  Bai, F. and Wang, T. (2006) On Graphical and Numerical Representation of Protein Sequences. Journal of Biomolecular Structure and Dynamics, 23, 537-545.
http://dx.doi.org/10.1080/07391102.2006.10507078
[21]  Randic, M. (2007) 2-D Graphical Representation of Proteins Based on Physico-Chemical Properties of Amino Acids. Chemical Physics Letters, 440, 291-295. http://dx.doi.org/10.1016/j.cplett.2007.04.037
[22]  Ghosh, A. and Nandy, A. (2011) Graphical Representation and Mathematical Characterization of Protein Sequences and Applications to Viral Proteins. Advances in Protein Chemistry and Structural Biology, 83, 1-42. http://dx.doi.org/10.1016/B978-0-12-381262-9.00001-X
[23]  Li, C., Yu, X., Yang, L., Zheng, X. and Wang, Z. (2009) 3-D Maps and Coupling Numbers for Protein Sequences. Physica A: Statistical Mechanics and Its Applications, 388, 1967-1972.
[24]  Randic, M., Zupan, J. and Vikic-Topic, D. (2007) On Representation of Proteins by Star-Like Graphs. Journal of Molecular Graphics and Modelling, 26, 290-305.
[25]  Li, C., Xing, L. and Wang, X. (2008) 2-D Graphical Representation of Protein Sequences and Its Application to Coronavirus Phylogeny. Journal of Biochemistry and Molecular Biology, 41, 217-222. http://dx.doi.org/10.5483/bmbrep.2008.41.3.217
[26]  Wen, J. and Zhang, Y. (2009) A 2D Graphical Representation of Protein Sequence and Its Numerical Characterization. Chemical Physics Letters, 476, 281-286.
http://dx.doi.org/10.1016/j.cplett.2009.06.017
[27]  Wu, Z.-C., Xiao, X. and Chou, K.-C. (2010) 2D-MH: A Web-Server for Generating Graphic Representation of Protein Sequences Based on the Physicochemical Properties of Their Constituent Amino Acids. Journal of Theoretical Biology, 267, 29-34. http://dx.doi.org/10.1016/j.jtbi.2010.08.007
[28]  Liao, B., Sun, X. and Zeng, Q. (2010) A Novel Method for Similarity Analysis and Protein Sub-Cellular Localization Prediction. Bioinformatics, 26, 2678-2683. http://dx.doi.org/10.1093/bioinformatics/btq521
[29]  Novic, M. and Randic, M. (2008) Representation of Proteins as Walks in 20-D Space. SAR and QSAR in Environmental Research, 19, 317-337. http://dx.doi.org/10.1080/10629360802085066
[30]  Qi, Z.-H., Feng, J., Qi, X.-Q. and Li, L. (2012) Application of 2D Graphic Representation of Protein Sequence Based on Huffman Tree Method. Computers in Biology and Medicine, 42, 556-563. http://dx.doi.org/10.1016/j.compbiomed.2012.01.011
[31]  Yu, H.-J. and Huang, D.-S. (2012) Novel 20-D Descriptors of Protein Sequences and It’s Applications in Similarity Analysis. Chemical Physics Letters, 531, 261-266.
http://dx.doi.org/10.1016/j.cplett.2012.02.030
[32]  He, P.-A., Wei, J., Yao, Y. and Tie, Z. (2012) A Novel Graphical Representation of Proteins and Its Application. Physica A: Statistical Mechanics and Its Applications, 391, 93-99.
[33]  Randic, M., Novic, M. and Vracko, M. (2008) On Novel Representation of Proteins Based on Amino Acid Adjacency Matrix. SAR and QSAR in Environmental Research, 19, 339-349.
http://dx.doi.org/10.1080/10629360802085082
[34]  Randic, M., Zupan, J. and Balaban, A.T. (2004) Unique Graphical Representation of Protein Sequences Based on Nucleotide Triplet Codons. Chemical Physics Letters, 397, 247-252.
http://dx.doi.org/10.1016/j.cplett.2004.08.118
[35]  Yao, Y.-H., Kong, F., Dai, Q. and He, P.-A. (2013) A Sequence-Segmented Method Applied to the Similarity Analysis of Long Protein Sequence. MATCH: Communications in Mathematical and in Computer Chemistry, 70, 431-450.
[36]  Abo-Elkhier, M.M. (2012) Similarity/Dissimilarity Analysis of Protein Sequences Using the Spatial Median as a Descriptor. Journal of Biophysical Chemistry, 3, 142-148. http://dx.doi.org/10.4236/jbpc.2012.32016
[37]  El-Lakkani, A. and El-Sherif, S. (2013) Similarity Analysis of Protein Sequences Based on 2D and 3D Amino Acid Adjacency Matrices. Chemical Physics Letters, 590, 192-195.
http://dx.doi.org/10.1016/j.cplett.2013.10.032
[38]  Abo el Maaty, M.I., Abo-Elkhier, M.M. and Abd Elwahaab, M.A. (2010) 3D Graphical Representation of Protein Sequences and Their Statistical Characterization. Physica A: Statistical Mechanics and Its Applications, 389, 4668-4676. http://dx.doi.org/10.1016/j.physa.2010.06.031
[39]  Wang, L., Peng, H. and Zheng, J.H. (2014) ADLD: A Novel Graphical Representation of Protein Sequences and Its Application. Computational and Mathematical Methods in Medicine, 2014, Article ID: 959753.
[40]  Balsera, M.A., Wriggers, W., Oono, Y. and Schulten, K. (1996) Principal Component Analysis and Long Time Protein Dynamics. Journal of Physical Chemistry, 100, 2567-2572.
[41]  Hess, B. (2000) Similarities between Principal Components of Protein Dynamics and Random Diffusion. Physical Review E—Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, 62, 8438-8448. http://dx.doi.org/10.1103/PhysRevE.62.8438
[42]  Tournier, A.L. and Smith, J.C. (2003) Principal Components of the Protein Dynamical Transition. Physical Review Letters, 91, Article ID: 208106. http://dx.doi.org/10.1103/PhysRevLett.91.208106
[43]  Feng, Z.-P. and Zhang, C.-T. (2002) A Graphic Representation of Protein Sequence and Predicting the Sub-Cellular Locations of Prokaryotic Proteins. International Journal of Biochemistry and Cell Biology, 34, 298-307. http://dx.doi.org/10.1016/S1357-2725(01)00121-2
[44]  Randic, M. (2000) On Characterization of DNA Primary Sequences by a Condensed Matrix. Chemical Physics Letters, 317, 29-34.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133