All Title Author
Keywords Abstract

Use of FFT in Protein Sequence Comparison under Their Binary Representations

DOI: 10.4236/cmb.2016.62003, PP. 33-40

Keywords: Voss Type Representation, Inter-Coefficient Difference (ICD) Method, Distance Matrix, Phylogenetic Tree, Fast Fourier Transform (FFT), ND5 and ND6 Category of Protein

Full-Text   Cite this paper   Add to My Lib


The paper considers Voss type representation of amino acids and uses FFT on the represented binary sequences to get the spectrum in the frequency domain. Based on the analysis of this spectrum by using the method of inter coefficient difference (ICD), it compares protein sequences of ND5 and ND6 category. Results obtained agree with the standard ones. The purpose of the paper is to extend the ICD method of comparison of DNA sequences to comparison of protein sequences. The topic of discussion is to develop a novel method of comparing protein sequences. The main achievements of the work are that the method applied is completely new of its kind, so far as protein sequence comparison is concerned and moreover the results of comparison agree with the previous results obtained by other methods for the same category of protein sequences.


[1]  Phillips, A., Janies, D. and Wheeler, W. (2000) Multiple Sequence Alignment in Phylogenetic Analysis. Molecular Phylogenetics and Evolution, 16, 317-330.
[2]  Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research, 22, 4673-4680.
[3]  Katoh, K., Misawa, K., Kuma, K. and Miyata, T. (2002) MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids Research, 30, 3059-3066.
[4]  Vinga, S. and Almeida, J. (2003) Alignment-Free Sequence Comparison—A Review. Bioinformatics, 19, 513-523.
[5]  Pinello, L., Lo Bosco, G. and Yuan, G.-C. (2013) Applications of Alignment-Free Methods in Epigenomics. Briefings in Bioinformatics, 15, 419-430.
[6]  Domazet-Loso, M. and Haubold, B. (2011) Alignment-Free Detection of Local Similarity among Viral and Bacterial Genomes. Bioinformatics, 27, 1466-1472.
[7]  Ghosh, S., Pal, J., Maji, B. and Bhattacharya, D.K. (2016) Condensed Matrix Descriptor for Proteinb Sequence Comparison. International Journal of Analytical Mass Spectrometry and Chromatography, 4, 1-13.
[8]  Li, C., Xing, L.L. and Wang, X. (2008) 2-D Graphical Representation of Protein Sequences and Its Application to Coronavirus Phylogeny. BMB Reports, 41, 217-222.
[9]  Randic, M., Mehulic, K., Vukicevic, D., Pisanski, T., Vikic-Topic, D. and Plavsic, D. (2009) Graphical Representation of Proteins as Four-Color Maps and Their Numerical Characterization. Journal of Molecular Graphics and Modelling, 27, 637-641.
[10]  Bai, F. and Wang, T. (2006) On Graphical and Numerical Representation of Protein Sequences. Journal of Biomolecular Structure and Dynamics, 23, 537-545.
[11]  Randic, M. (2007) 2-D Graphical Representation of Proteins Based on Physico-Chemical Properties of Amino Acids. Chemical Physics Letters, 440, 291-295.
[12]  Ghosh, A. and Nandy, A. (2011) Graphical Representation and Mathematical Characterization of Protein Sequences and Applications to Viral Proteins. Advances in Protein Chemistry and Structural Biology, 83, 1-42.
[13]  Randic, M., Zupan, J. and Vikic-Topic, D. (2007) On Representation of Proteins by Star-Like Graphs. Journal of Molecular Graphics and Modelling, 26, 290-305.
[14]  Wen, J. and Zhang, Y. (2009) A 2D Graphical Representation of Protein Sequence and Its Numerical Characterization. Chemical Physics Letters, 476, 281-286.
[15]  Liao, B., Sun, X. and Zeng, Q. (2010) A Novel Method for Similarity Analysis and Protein Sub-Cellular Localization Prediction. Bio-Informatics, 26, 2678-2683.
[16]  Novic, M. and Randic, M. (2008) Representation of Proteins as Walks in 20-D Space. SAR and QSAR in Environmental Research, 19, 317-337.
[17]  Yu, H.-J. and Huang, D.-S. (2012) Novel 20-D Descriptors of Protein Sequences and Its Applications in Similarity Analysis. Chemical Physics Letters, 531, 261-266.
[18]  He, P.-A., Wei, J., Yao, Y. and Tie, Z. (2012) A Novel Graphical Representation of Proteins and Its Application. Physica A: Statistical Mechanics and Its Applications, 391, 93-99.
[19]  Randic, M., Novic, M. and Vracko, M. (2008) On Novel Representation of Proteins Based on Amino Acid Adjacency Matrix. SAR and QSAR in Environmental Research, 19, 339-349.
[20]  Abo-Elkhier, M.M. (2012) Similarity/Dissimilarity Analysis of Protein Sequences Using the Spatial Median as a Descriptor. Journal of Biophysical Chemistry, 3, 142-148.
[21]  Randic, M., Zupan, J. and Balaban, A.T. (2004) Unique Graphical Representation of Protein Sequences Based on Nucleotide Triplet Codons. Chemical Physics Letters, 397, 247-252.
[22]  El-Lakkani, A. and El-Sherif, S. (2013) Similarity Analysis of Protein Sequences Based on 2D and 3D Amino Acid Adjacency Matrices. Chemical Physics Letters, 590, 192-195.
[23]  Feng, Z.-P. and Zhang, C.-T. (2002) A Graphic Representation of Protein Sequence and Predicting the Sub-Cellular Locations of Prokaryotic Proteins. International Journal of Biochemistry and Cell Biology, 34, 298-307.
[24]  Yao, Y.H., Kong, F., Dai, Q. and He, P.-A. (2013) A Sequence-Segmented Method Applied to the Similarity Analysis of Long Protein Sequence. MATCH: Communications in Mathematical and in Computer Chemistry, 70, 431-450.
[25]  He, P.-A., Li, X.-F., Yang, J.-L. and Wang, J. (2011) A Novel Descriptor for Protein Similarity Analysis. MATCH: Communications in Mathematical and in Computer Chemistry, 65, 445-458.
[26]  Ghosh, S., Pal, J., Das, S. and Bhattacharya, D.K. (2015) Differentiation of Protein Sequence Comparison Based on Biological and Theoretical Classifications of Amino Acids in Six Groups. International Journal of Computer Science and Software Engineering, 5, 695-698.
[27]  Zhang, Y.S. and Yu, X.T. (2010) Analysis of Protein Sequence Similarity. IEEE, 1255-1258.
[28]  Wu, Y.-L., Agrawal, D. and El Abbadi, A. (2000) A Comparison of DFT and DWT Based Similarity Search in Time-Series Databases. Proceedings of the 9th International Conference on Information and Knowledge Management, McLean, 6-11 November 2000, 488-495.
[29]  Anastassiou, D. (2000) Frequency-Domain Analysis of Bimolecular Sequences. Bioinformatics, 16, 1073-1081.
[30]  Vaidyanathan, P. and Yoon, B.-J. (2004) The Role of Signal Processing Concepts in Genomics and Proteomics. Journal of the Franklin Institute, 341, 111-135.
[31]  Brigham, E.O. and Morrow, R.E. (1967) The Fast Fourier Transform. IEEE Spectrum, 4, 63-70.
[32]  Lyons, R.G. (2004) Understanding Digital Signal Processing. Pearson Education, Upper Saddle River.
[33]  Oppenheim, A.V. and Schafer, R.W. (2010) Discrete-Time Signal Processing. 3rd Edition, Prentice Hall, Upper Saddle River.
[34]  Akhtar, M., Epps, J. and Ambikairajah, E. (2008) Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction. IEEE Journal of Selected Topics in Signal Processing, 2, 310-321.
[35]  Yin, C.C. and Yau, S.S.-T. (2007) Prediction of Protein Coding Regions by the 3-Base Periodicity Analysis of a DNA Sequence. Journal of Theoretical Biology, 247, 687-894.
[36]  Tiwari, S., Ramchandran, S., Bhattacharya, A., Bhattacharya, S. and Ramaswami, R. (1997) Prediction of Probable Genes by Fourier Analysis of Genome Sequences. Computer Applications in the Biosciences, 13, 263-270.
[37]  Afreixo, V., Bastos, C.A., Garcia, S.P. and Ferrieira, P.J. (2009) Genome Analysis with Inter-Nucleotide Distances. Bioinformatics, 25, 3064-3070.
[38]  Abu-Zahhad, M., Ahmed, S.M. and Abd-Elrahman, S.A. (2012) Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Nu-merical Mapping Techniques. International Journal of Information Technology and Computer Science (IJITCS), 8, 22-36.
[39]  Sitansu, S.S. and Panda, G. (2010) A DSP Approach for Protein Coding Region Identification in DNA Sequence. International Journal of Signal and Image Processing, 1, 75-79.
[40]  Saberkari, H., Shamsi, M., Sedaaghi, M. and Golabi, F. (2012) Prediction of Protein Coding Regions in DNA Sequences Using Signal Processing Methods. IEEE Symposium on Industrial Electronics and Applications (ISIEA), Bandung, 23-26 September 2012, 355-360.
[41]  Hoang, T., Yin, C.C., Zheng, H., Yu, C.L. and He, R.L. (2015) A New Method to Cluster DNA Sequences Using Fourier Power Spectrum. Journal of Theoretical Biology, 372, 135-145.
[42]  King, B.R., Aburdene, M., Thompson, A. and Warres, Z. (2014) Application of Discrete Fourier Inter-Coefficient Difference for Assessing Genetic Sequence Similarity. EURASIP Journal on Bioinformatics and Systems Biology, 2014, 8.
[43]  Ghosh, S., Pal, J. and Bhattacharya, D.K. (2014) Classi-fication of Amino Acids of a Protein on the Basis of Fuzzy Set Theory. International Journal of Modern Sciences and Engineering Technology, 1, 30-35.
[44]  Jafarzadeh, N. and Iranmanesh, A. (2015) A New Measure for Pairwise Comparison of Protein Sequences. MATCH: Communications in Mathematical and in Computer Chemistry, 74, 563-574.


comments powered by Disqus

Contact Us


微信:OALib Journal