oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Modeling compositional dynamics based on GC and purine contents of protein-coding sequences
Zhang Zhang, Jun Yu
Biology Direct , 2010, DOI: 10.1186/1745-6150-5-63
Abstract: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft.Compositional biases in the contexts of nucleotides, codons, and amino acids are found among bacteria [1-4], fungi [5,6], insects [7-10], plants [11,12], and vertebrates [13,14], which presumably arise from unbalanced forces of mutation and selection and are maintained by the species in their populations [15-17]. For any individual gene, its compositional biases reflect the action of both mutation and selection, which is also linked to the abundance of iso-accepting transfer RNAs and the catalytic efficiencies of their synthetases, thereby translation efficiencies [2,6,18-22]. Therefore, composition analysis is of great significance in better understanding compositional dynamics in order to provide evidence for molecular evolution [23,24].Nucleotide compositions are highly variable among genomes, and the guanine-p
Compositional representation of protein sequences and the number of Eulerian loops  [PDF]
Bailin Hao,Huimin Xie,Shuyu Zhang
Physics , 2001,
Abstract: An amino acid sequence of a protein may be decomposed into consecutive overlapping strings of length K. How unique is the converse, i.e., reconstruction of amino acid sequences using the set of K-strings obtained in the decomposition? This problem may be transformed into the problem of counting the number of Eulerian loops in an Euler graph, though the well-known formula must be modified. By exhaustive enumeration and by using the modified formula we show that the reconstruction is unique at K equal or greater than 5 for an overwhelming majority of the proteins in the PDB.seq database. The corresponding Euler graphs provide a means to study the structure of repeated segments in protein sequences.
Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution  [PDF]
Erez Persi,David Horn
PLOS Computational Biology , 2013, DOI: 10.1371/journal.pcbi.1003346
Abstract: We present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic structure of motifs and otherwise low complexity amino-acid regions. We introduce new order measures, distinguishing between ‘regularity’, ‘periodicity’ and ‘vocabulary’, to quantify these phenomena and to facilitate the identification of evolutionary effects. Detailed analysis of representative species across the tree-of-life demonstrates that CO proteins exhibit numerous functional enrichments, including a wide repertoire of particular patterns of dependencies on regularity and periodicity. Comparison between human and mouse proteomes further reveals the interplay of CO with evolutionary trends, such as faster substitution rate in mouse leading to decrease of periodicity, while innovation along the human lineage leads to larger regularity. Large-scale analysis of 94 proteomes leads to systematic ordering of all major taxonomic groups according to FT-vocabulary size. This is measured by the count of Different Frequent Triplets (DFT) in proteomes. The latter provides a clear hierarchical delineation of vertebrates, invertebrates, plants, fungi and prokaryotes, with thermophiles showing the lowest level of FT-vocabulary. Among eukaryotes, this ordering correlates with phylogenetic proximity. Interestingly, in all kingdoms CO accumulation in the proteome has universal characteristics. We suggest that CO is a genomic-information correlate of both macroevolution and various protein functions. The results indicate a mechanism of genomic ‘innovation’ at the peptide level, involved in protein elongation, shaped in a universal manner by mutational and selective forces.
In Silico Characterization of Pectate Lyase Protein Sequences from Different Source Organisms  [PDF]
Amit Kumar Dubey,Sangeeta Yadav,Manish Kumar,Vinay Kumar Singh,Bijaya Ketan Sarangi,Dinesh Yadav
Enzyme Research , 2010, DOI: 10.4061/2010/950230
Abstract: A total of 121 protein sequences of pectate lyases were subjected to homology search, multiple sequence alignment, phylogenetic tree construction, and motif analysis. The phylogenetic tree constructed revealed different clusters based on different source organisms representing bacterial, fungal, plant, and nematode pectate lyases. The multiple accessions of bacterial, fungal, nematode, and plant pectate lyase protein sequences were placed closely revealing a sequence level similarity. The multiple sequence alignment of these pectate lyase protein sequences from different source organisms showed conserved regions at different stretches with maximum homology from amino acid residues 439–467, 715–816, and 829–910 which could be used for designing degenerate primers or probes specific for pectate lyases. The motif analysis revealed a conserved Pec_Lyase_C domain uniformly observed in all pectate lyases irrespective of variable sources suggesting its possible role in structural and enzymatic functions. 1. Introduction The enzymes hydrolyzing pectic substances ubiquitously present in the plant kingdom forming major components of middle lamella are referred as pectinases. The production, purification, biochemical characterization, and application of pectinases have been extensively reviewed [1–10]. The pectinases include polygalacturonases, pectic esterases, pectin lyases, and pectate lyases depending on their mode of action [1]. Pectate lyase (PL, EC 4.2.2.2) cleaves the -1,4 glycosidic bonds of polygalacturonic acid via a -elimination reaction producing unsaturated 4, 5 bond at the nonreducing end of the polysaccharide and generates 4,5-unsaturated oligogalacturonates. Pectate lyase is widely distributed in diverse families of microorganisms and plants. The important members of bacterial family include Erwinia carotovora, Bacillus polymyxa, Klebsiella, Yersinia, Cytophaga, Pseudomonas, and Xanthomonas while in fungi Aspergillus, Fusarium, and Penicillium are the most predominant source [9, 11–14]. A number of pectate lyase genes have been cloned, sequenced, and expressed from different source organism, namely, bacteria [15–22], fungi [23–25], yeast [26], nematode [27] and plants [14, 28]. The three-dimensional structures of various extracellular pectate lyase have been reported [29–36]. The pectate lyases, in general, have a parallel -helix domain formed by parallel-strands folded into a large right-handed helix and a major loop region. Amino acid sequence homology-based classification of pectate lyases into distinct families suggesting the possible
Similar rates of protein adaptation in Drosophila miranda and D. melanogaster, two species with different current effective population sizes
Doris Bachtrog
BMC Evolutionary Biology , 2008, DOI: 10.1186/1471-2148-8-334
Abstract: Here I study patterns of polymorphism and divergence at 91 X-linked loci in D. miranda, a species with a roughly 5-fold smaller effective population size than D. melanogaster. Surprisingly, I find a similar fraction of amino-acid mutations being driven to fixation by positive selection in D. miranda and D. melanogaster. Genes with higher rates of amino-acid evolution show lower levels of neutral diversity, a pattern predicted by recurrent adaptive protein evolution. I fit a hitchhiking model to patterns of polymorphism in D. miranda and D. melanogaster and estimate an order of magnitude higher selection coefficients for beneficial mutations in D. miranda.This analysis suggests that effective population size may not be a major determinant in rates of protein adaptation. Instead, adaptation may not be mutation-limited, or the distribution of fitness effects for beneficial mutations might differ vastly between different species or populations. Alternative explanation such as biases in estimating the fraction of beneficial mutations or slightly deleterious mutation models are also discussed.Researchers have made considerable progress in recent years to quantify rates of adaptive evolution in the genome using population variability data [1-6]. Many studies aimed at detecting adaptive evolution have applied the McDonald-Kreitman (MK) test [7] or modifications of it, which contrasts the number of polymorphisms within a species to the number of substitutions between species at two classes of sites, a putatively neutral and a putatively selected class. In protein-coding sequences these classes are usually synonymous and replacement sites [7].Several members in the Drosophila melanogaster species group show high rates of adaptive amino-acid evolution. Using the MK test and its extensions, about half (and up to 95%) of all amino-acid mutations fixed between species are inferred to be driven by positive selection [1-4]. Some uncertainty in estimates of α, the fraction of amino-
An analysis of mobile genetic elements in three Plasmodium species and their potential impact on the nucleotide composition of the P. falciparum genome
Pierre M Durand, Andries J Oelofse, Theresa L Coetzer
BMC Genomics , 2006, DOI: 10.1186/1471-2164-7-282
Abstract: Whole genome analysis was performed using bioinformatic methods. Forty potential protein encoding sequences with features of transposable elements were identified in P. vivax, eight in P. y. yoelii and only six in P. falciparum. Further investigation of the six open reading frames in P. falciparum revealed that only one is potentially an active mobile genetic element. Most of the open reading frames identified in all three species are hypothetical proteins. Some represent annotated host proteins such as the putative telomerase reverse transcriptase genes in P. y. yoelii and P. falciparum. One of the P. vivax open reading frames identified in this study demonstrates similarity to telomerase reverse transcriptase and we conclude it to be the orthologue of this gene.There is a divergence in the frequencies of mobile genetic elements in the three Plasmodium species investigated. Despite the limitations of whole genome analytical methods, it is tempting to speculate that mobile genetic elements might have been a driving force behind the compositional bias of the P. falciparum genome.Mobile genetic elements (MGEs) play a fundamental role as drivers of genome evolution, shaping both genes and genomes and often constitute a large fraction of the genome (for a review of mobile elements and genome evolution see [1,2]). The mutagenic effects of MGE behaviour are well documented and include a spectrum, from point mutations to whole genome restructuring. In addition, MGEs have occasionally become "domesticated" and evolved to fulfill essential functions in genome dynamics e.g. telomerase [2]. Consequently, MGEs and their derivatives have been identified in almost all organisms. Laboratory evidence has repeatedly demonstrated that MGEs can have either a beneficial [3] or detrimental [4] effect on the host's fitness depending on the downstream effects of transposition. To counteract the detrimental effects, some organisms have developed protective mechanisms against invading MGEs,
Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
Oliver Keller, Florian Odronitz, Mario Stanke, Martin Kollmar, Stephan Waack
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-278
Abstract: Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes (intra species search). For cross-species annotation, Scipio was used to annotate 40 genes from Homo sapiens in the primates Pongo pygmaeus abelii and Callithrix jacchus. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate.Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species.In the post-genome era, sequence data is the entry point for many studies. Often, it is essential to obtain the correct genomic DNA sequences of eukaryotic genes because of the information contained in non-coding regions. For example, the intron regions contain important sites for the regulation of gene transcription, like enhancers, repressors, and silencers [1]. Transcription initiator seque
Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns
Lenka Dvo?áková, Fatima Cvr?ková, Luká? Fischer
BMC Genomics , 2007, DOI: 10.1186/1471-2164-8-412
Abstract: We have performed a phylogenetic analysis of HyPRPs from seven plant species, including representatives of gymnosperms and both monocot and dicot angiosperms. Every species studied possesses a large family of 14–52 HyPRPs. Angiosperm HyPRPs exhibit signs of recent major diversification involving, at least in Arabidopsis and rice, several independent tandem gene multiplications. A distinct subfamily of relatively well-conserved C-type HyPRPs, often with long hydrophobic PR domains, has been identified. In most of gymnosperm (pine) HyPRPs, diversity appears within the C-type group while angiosperms have only a few of well-conserved C-type representatives. Atypical (glycine-rich or extremely short) N-terminal domains apparently evolved independently in multiple lineages of the HyPRP family, possibly via inversion or loss of sequences encoding proline-rich domains. Expression profiles of potato and Arabidopsis HyPRP genes exhibit instances of both overlapping and complementary organ distribution. The diversified non-C-type HyPRP genes from recently amplified chromosomal clusters in Arabidopsis often share their specialized expression profiles. C-type genes have broader expression patterns in both species (potato and Arabidopsis), although orthologous genes exhibit some differences.HyPRPs represent a dynamically evolving protein family apparently unique to seed plants. We suggest that ancestral HyPRPs with long proline-rich domains produced the current diversity through ongoing gene duplications accompanied by shortening, modification or loss of the proline-rich domains. Most of the diversity in gymnosperms and angiosperms originates from different branches of the HyPRP family. Rapid sequence diversification is consistent with only limited requirements for structure conservation and, together with high variability of gene expression patterns, limits the interpretation of any functional study focused on a single HyPRP gene or a couple of HYPRP genes in single plant specie
Word Decoding of Protein Amino Acid Sequences with Availability Analysis: A Linguistic Approach  [PDF]
Kenta Motomura, Tomohiro Fujita, Motosuke Tsutsumi, Satsuki Kikuzato, Morikazu Nakamura, Joji M. Otaki
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0050039
Abstract: The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.
Mare’s milk: composition and protein fraction in comparison with different milk species  [cached]
Klemen Poto?nik,Vesna Gantner,Kre?imir Kuterovac,Angela Cividini
Mljekarstvo , 2011,
Abstract: The usage of the mare’s milk as functional food especial for children intolerant to cow’s milk, with neurodermitis, allergies and similar disorders desiring to improve the quality of life is fiercely debated for last decades but there were no scientific studies to suggest such use of mare’s milk based on scientific research. The objectives of this study were to determine similarities of mare’s milk in comparison with milk of ruminants (cattle, sheep and goat) and human milk in terms of milk composition and protein fraction as whey proteins, caseins and micelles size. All differences were discussed regarding usage of mare’s milk in human diet and compared to milk which is usually used in human nutrition. Regarding composition, the mare’s milk is similar to human milk in of crude protein, salt and lactose content, but it has significantly lower content of fat. Fractions of main proteins are similar between human and mare’s milk, except nitrogen casein (casein N) which has twice lower content in human than in mare’s milk. Content of casein N from all ruminants’ milk differ much more. Just for true whey N and non-protein nitrogen (NPN) similar content as human and mare’s milk has also goat milk. The casein content is the lowest in human milk; this content is three times greater in mare’s milk and six to seven times greater in goat’s and cow’s milk, while in sheep’s milk it is more than 10 times grater. In many components and fractions mare’s milk is more similar to human milk than milk of ruminants. A detail comparison of protein fraction shows quite large differences between milk of different species. More study and clinical research are needed that can recommend usage of mare’s milk in human diet as functional food on scientific bases.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.