We begin here a combinatorial study of dinucleotide circular codes. A word written on a circle is called circular. A set of dinucleotides is a circular code if all circular words constructed with this set have a unique decomposition. Propositions based on a letter necklace allow to determine the 24 maximum dinucleotide circular codes (of 6 elements). A partition property is also identified with eight self-complementary maximum dinucleotide circular codes and two classes of eight maximum dinucleotide circular codes in bijective correspondence by the complementarity map. 1. Introduction We continue our study of the combinatorial properties of circular codes in genes, that is, on the nucleotide alphabet . A dinucleotide is a word of two letters (diletter) on . A trinucleotide is a word of three letters (triletter) on . The two sets of 16 dinucleotides and 64 trinucleotides are codes in the sense of language theory but not circular codes [1, 2]. In order to have an intuitive meaning of these notions, codes are written on a straight line, while circular codes are written on a circle, but, in both cases, unique decipherability is required. Trinucleotide comma-free codes, a very particular case of trinucleotide circular codes, have been studied for a long time, see for example, [3–5]. After the discovery of a trinucleotide circular code in genes with strong mathematical properties [6], circular codes are mathematical objects studied in combinatorics, theoretical computer science, and theoretical biology. This theory underwent a rapid development, see for example, [7–27]. Trinucleotides are the fundamental words for genes, that is, the DNA sequences coding the amino acids constituting the protein sequences. However, dinucleotides are also words with important biological functions in genomes. Dinucleotides are involved in some genome sites, for example, the splice sites of introns in eukaryotic genomes are based on the dinucleotides and [28, 29]. Dinucleotides are also involved in some genome regions, for example, the dinucleotide in animal and plant genomes allows a positive or negative control over gene expression [30], and the dinucleotides [31, 32], [33], and [34] in eukaryotic genomes occur as concatenated words , (called tandem repeats in biology). We begin here a new combinatorial study concerning the dinucleotide circular codes. Their number, their list, and a partition according to the complementarity map are determined with propositions based on a letter necklace. 2. Preliminaries The following definitions and propositions are classical for any finite
References
[1]
J. Berstel and D. Perrin, Theory of Codes, Academic Press, London, UK, 1985.
[2]
J. L. Lassez, “Circular codes and synchronization,” International Journal of Computer and Information Sciences, vol. 5, no. 2, pp. 201–208, 1976.
[3]
F. H. C. Crick, J. S. Griffith, and L. E. Orgel, “Codes without commas,” Proceedings of the National Academy of Sciences, vol. 43, pp. 416–421, 1957.
[4]
S. W. Golomb, B. Gordon, and L. R. Welch, “Comma-free codes,” Canadian Journal of Mathematics, vol. 10, pp. 202–209, 1958.
[5]
S. W. Golomb, L. R. Welch, and M. Delbrück, “Construction and properties of comma-free codes,” Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab, vol. 23, no. 9, 1958.
[6]
D. G. Arquès and C. J. Michel, “A complementary circular code in the protein coding genes,” Journal of Theoretical Biology, vol. 182, no. 1, pp. 45–58, 1996.
[7]
A. J. Koch and J. Lehmann, “About a symmetry of the genetic code,” Journal of Theoretical Biology, vol. 189, no. 2, pp. 171–174, 1997.
[8]
M. P. Béal and J. Senellart, “On the bound of the synchronization delay of a local automaton,” Theoretical Computer Science, vol. 205, no. 1-2, pp. 297–306, 1998.
[9]
F. Bassino, “Generating functions of circular codes,” Advances in Applied Mathematics, vol. 22, no. 1, pp. 1–24, 1999.
[10]
R. Jolivet and F. Rothen, “Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code,” in Proceedings of the 1st European Workshop in Exo-/Astro-Biology, P. Ehrenfreund, O. Angerer, and B. Battrick, Eds., ESA SP-496, pp. 173–176, Noordwijk, The Netherlands.
[11]
G. Frey and C. J. Michel, “Circular codes in archaeal genomes,” Journal of Theoretical Biology, vol. 223, no. 4, pp. 413–431, 2003.
[12]
C. Nikolaou and Y. Almirantis, “Mutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and non-coding genomic sequences,” Journal of Theoretical Biology, vol. 223, no. 4, pp. 477–487, 2003.
[13]
G. Pirillo, “A characterization for a set of trinucleotides to be a circular code,” in Determinism, Holism, and Complexity, C. Pellegrini, P. Cerrai, P. Freguglia, V. Benci, and G. Israel, Eds., Kluwer Academic Publisher, New York, NY, USA, 2003.
[14]
G. Pirillo and M. A. Pirillo, “Growth function of self-complementary circular codes,” Biology Forum, vol. 98, no. 1, pp. 97–110, 2005.
[15]
G. Frey and C. J. Michel, “Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes,” Computational Biology and Chemistry, vol. 30, no. 2, pp. 87–101, 2006.
[16]
J. L. Lassez, R. A. Rossi, and A. E. Bernal, “Crick's hypothesis revisited: the existence of a universal coding frame,” in Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW '07), pp. 745–751, Niagara Falls, Canada, May 2007.
[17]
C. J. Michel, G. Pirillo, and M. A. Pirillo, “Varieties of comma-free codes,” Computers and Mathematics with Applications, vol. 55, no. 5, pp. 989–996, 2008.
[18]
C. J. Michel, G. Pirillo, and M. A. Pirillo, “A relation between trinucleotide comma-free codes and trinucleotide circular codes,” Theoretical Computer Science, vol. 401, no. 1–3, pp. 17–26, 2008.
[19]
G. Pirillo, “A hierarchy for circular codes,” RAIRO-Theoretical Informatics and Applications, vol. 42, no. 4, pp. 717–728, 2008.
[20]
G. Pirillo, “Some remarks on prefix and suffix codes,” Pure Mathematics and Applications, vol. 19, pp. 53–60, 2008.
[21]
C. J. Michel and G. Pirillo, “Identification of all trinucleotide circular codes,” Computational Biology and Chemistry, vol. 34, no. 2, pp. 122–125, 2010.
[22]
G. Pirillo, “Non sharing border codes,” The Advances in Applied Mathematics and Mechanics, vol. 3, pp. 215–223, 2010.
[23]
C. J. Michel and G. Pirillo, “Strong trinucleotide circular codes,” International Journal of Combinatorics, vol. 2011, Article ID 659567, 14 pages, 2011.
[24]
L. Bussoli, C. J. Michel, and G. Pirillo, “On some forbidden configurations for self-complementary trinucleotide circular codes,” Journal for Algebra Number Theory Academia, vol. 2, pp. 223–232, 2011.
[25]
D. L. Gonzalez, S. Giannerini, and R. Rosa, “Circular codes revisited: a statistical approach,” Journal of Theoretical Biology, vol. 275, no. 1, pp. 21–28, 2011.
[26]
L. Bussoli, C. J. Michel, and G. Pirillo, “On conjugation partitions of sets of trinucleotides,” Applied mathematics, vol. 3, pp. 107–112, 2012.
[27]
C. J. Michel, G. Pirillo, and M. A. Pirillo, “A classification of 20-trinucleotide circular codes,” Information and Computation, vol. 212, pp. 55–63, 2012.
[28]
M. Burset, I. A. Seledtsov, and V. V. Solovyev, “Analysis of canonical and non-canonical splice sites in mammalian genomes,” Nucleic Acids Research, vol. 28, no. 21, pp. 4364–4375, 2000.
[29]
S. M. Mount, “A catalogue of splice junction sequences,” Nucleic Acids Research, vol. 10, no. 2, pp. 459–472, 1982.
[30]
A. Bird, “The dinucleotide CG as a genomic signalling module,” Journal of Molecular Biology, vol. 409, no. 1, pp. 47–53, 2011.
[31]
F. Gebhardt, K. S. Z?nker, and B. Brandt, “Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1,” Journal of Biological Chemistry, vol. 274, no. 19, pp. 13176–13180, 1999.
[32]
H. Buerger, J. Packeisen, A. Boecker et al., “Allelic length of a CA dinucleotide repeat in the egfr gene correlates with the frequency of amplifications of this sequence—first results of an inter-ethnic breast cancer study,” Journal of Pathology, vol. 203, no. 1, pp. 545–550, 2004.
[33]
A. L. Schmidt and V. Mitter, “Microsatellite mutation directed by an external stimulus,” Mutation Research, vol. 568, no. 2, pp. 233–243, 2004.
[34]
H. Cuppens, W. Lin, M. Jaspers et al., “Polyvariant mutant cystic fibrosis transmembrane conductance regulator genes: the polymorphic (TG)m locus explains the partial penetrance of the T5 polymorphism as a disease mutation,” Journal of Clinical Investigation, vol. 101, no. 2, pp. 487–496, 1998.
[35]
J. Rozenski, P. F. Crain, and J. A. McCloskey, “The RNA modification database: 1999 update,” Nucleic Acids Research, vol. 27, no. 1, pp. 196–197, 1999.