%0 Journal Article %T Detection of Introns in Eukaryotic Small Subunit Ribosomal RNA Gene Sequences %A Dipankar Bachar %A Laure Guillou %A Richard Christen %J Dataset Papers in Science %D 2013 %R 10.7167/2013/854869 %X The gene encoding SSU-rRNA sequences is the tool of choice for phylogenetic analyses and environmental biodiversity analyses of bacteria, Archaea but also unicellular Eukaryota. In Eukaryota, gene sequences may often be interrupted by long or several introns. Searching in GenBank release 188, we found descriptions of 3638 such sequences. Using a database of 180£¿000 SSU-rRNA sequences well annotated for taxonomy and a C++ program written for that purpose, we computed the presence of 18£¿691 introns (among which the 3638 described introns). Filtering on length and sequence quality, 3646 sequences were retained. These introns were clustered; clusters were analyzed for the presence of single or multiple clades at various levels of taxonomic depth, allowing future analyses of horizontal transfers. Various analyses of the results are provided as tabulated files as well as FASTA files of described or computed introns. Each sequence is annotated for cellular location (nuclear, chloroplast, and mitochondria), positions at which they were found in the SSU-rRNA sequences and taxonomy as provided by GenBank. 1. Introduction The gene that microbiologists use to determine the taxonomic affiliations of microbes using molecular methods needs to meet a number of requirements. It has to be conservative in its function and present in every organism analyzed. Often the presence of conserved domains is required, allowing the design and use of universal PCR primers. Finally, sequences of most known organisms must be available in the public databases (i.e., International Nucleotide Sequence Database Collaboration (INSDC) between Japanese, European, and American nucleotide databases, resp., DDBJ, ENA, and GenBank, http://www.insdc.org/). A gene meets these requirements to a high degree: the gene for one of the RNA subunits that together form the ribosome, also known as the small subunit ribosomal RNA (SSU-rRNA) gene. For that reason, the gene encoding the SSU-rRNA serves as a prominent tool for phylogenetic and environmental biodiversity analyses of bacteria, Archaea but also unicellular Eukaryota [1¨C3]. SSU-rRNA gene sequences may contain numerous self-splicing introns of variable lengths [4¨C23]. The SSU-rRNA genes can thus be enlarged to up to 3.5£¿kb. Introns have rarely been identified in bacterial SSU-rRNA gene sequences (see one example in Thiomargarita namibiensis [5]), but they are often present in SSU-rRNA gene sequences of Eukaryota (see the aforementioned part). Such length heterogeneity of SSU-rRNA gene sequences has so far seldomly been considered when constructing %U http://www.hindawi.com/journals/dpis/2013/854869/