|
Studying the Polypeptide Sequence (α-Code) of Escherichia coliDOI: 10.1155/2013/961378 Abstract: This paper is devoted to algebraically simulating the α-code of bacterium Escherichia coli and studying contrast factors (words) in its polypeptide sequence. We utilize the methods of spectral theory of graphs which were previously employed by us for enumerating De Bruijn and Kautz sequences. The empirical material is borrowed from the computer investigation of contrast factors in the polypeptide sequences of prokaryotes. 1. Introduction It was proposed [1, 2] to divide 19 out of all the 20 amino acids into two subgroups, an alanine subgroup (of fatty amino acids: alanine, phenylamine, isoleucine, leucine, methionine, proline, threonine, and valine) and a glycine subgroup (of more polar amino acids: cysteine, aspartic acid, glutamic acid, glycine, histidine, lysine, asparagine, glutamine, arginine, tryptophan, and tyrosine), while serine remains a spare element in the full classification thereof. In a shorthand notation, this gives , , , , , , , and (an alanine subgroup); , , , , , , , , , , and (a glycine subgroup); and s (a free character). The three numbers 1, 2, and 3 were picked to represent the main two subgroups and the character s, respectively. Brute statistics, under the natural ratio of 1s to 2s being 0.526: 0.474, had predicted an almost regular distribution (alternation) of the two ciphers. To check such a hypothesis, there were found the frequencies of all ( ) possible substrings of the length in the genomic sequence of E. coli. The results at once showed that the respective perfectly alternating substrings are in fact the least frequent ones, in the entire genomic sequence. However, visual observation allowed suspecting that the main condition for near-to-statistical distribution of the two subgroups of amino acids may be disguised in grouping equal ciphers (either 1 or 2) representing the respective subgroups in adjacent pairs thereof. That is, in lieu of the code it should be , where general ratio of ciphers stays thus unaltered. The situation with pairing of equal ciphers reminded us of the phenomenon of the so-called α-code in polypeptides. According to it, one turn of polypeptide spiral involves 3.5 amino acids on an average. Since the nearest to 3.5 multiple integer equals 7, it was of interest to interpret the α-code as a one in which all structural features are due to conditionally grouping amino acids into consecutive sevens thereof. Merging the ideas of sevens (which suit well for interpreting the α-code) and pairs (which better obey the natural proportion of amino acids and follow experimental observation) allows us to set
|