oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Multifractal characterisation of length sequences of coding and noncoding segments in a complete genome  [PDF]
Zu-Guo Yu,Vo Anh,Ka-Sing Lau
Physics , 2001, DOI: 10.1016/S0378-4371(01)00391-0
Abstract: The coding and noncoding length sequences constructed from a complete genome are characterised by multifractal analysis. The dimension spectrum $D_{q}$ and its derivative, the 'analogous' specific heat $C_{q}$, are calculated for the coding and noncoding length sequences of bacteria, where $q$ is the moment order of the partition sum of the sequences. From the shape of the $% D_{q}$ and $C_{q}$ curves, it is seen that there exists a clear difference between the coding/noncoding length sequences of all organisms considered and a completely random sequence. The complexity of noncoding length sequences is higher than that of coding length sequences for bacteria. Almost all $D_{q}$ curves for coding length sequences are flat, so their multifractality is small whereas almost all $D_{q}$ curves for noncoding length sequences are multifractal-like. We propose to characterise the bacteria according to the types of the $C_{q}$ curves of their noncoding length sequences.
Transcriptome screen for fast evolving genes by Inter-Specific Selective Hybridization (ISSH)
Juan I Montoya-Burgos, Aurélia Foulon, Ilham Bahechar
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-126
Abstract: We demonstrate the efficiency of the ISSH method by generating a brain cDNA library enriched in fast evolving transcripts of a non-model catfish species as well as a control, non-enriched library. Our results indicate that the enriched library contains effectively more fast evolving sequences than the control library. Gene annotation analyses also indicate enrichment in genes with low expression levels and non-ubiquitously expressed genes, both categories encompassing the majority of fast evolving genes. Furthermore, most of the identified transcripts show higher sequence divergence between two closely related catfish species as compared to recognized fast evolving DNA markers.The ISSH method offers a simple, inexpensive and efficient way to screen the transcriptome for isolating fast evolving genes. This method opens new opportunities in the investigation of biological mechanisms that include fast evolving genes, such as the evolution of lineage specific processes and traits responsible for species adaptation to their environment.Fast evolving DNA sequences are used for answering a broad range of biological questions relative to population processes and phylogeography [e.g. [1]], species diversification [e.g. [2,3]], conservation biology [4] and also genome or phenotype mapping [e.g. [5]]. However, due to the very same intrinsic quality for which they are looked for, i.e. their high evolutionary rate, fast evolving DNA sequences display "lineage specific" changes and therefore require de novo development each time a new group of non-model organisms is being investigated. Despite various methodologies targeted toward the isolation of unspecific polymorphic DNA fragments [e.g. [6-8]] the identification and the isolation of fast evolving DNA sequences in non-model organisms is still laborious and expensive, making it a major impediment to the routine analysis of multiple loci on many taxa.The isolation of fast evolving genes has gained new motivation and attention as
Exploration of Noncoding Sequences in Metagenomes  [PDF]
Fabián Tobar-Tosse, Adrián C. Rodríguez, Patricia E. Vélez, María M. Zambrano, Pedro A. Moreno
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0059488
Abstract: Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.
Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering  [PDF]
Sebastian Will,Kristin Reiche,Ivo L Hofacker,Peter F Stadler,Rolf Backofen
PLOS Computational Biology , 2007, DOI: 10.1371/journal.pcbi.0030065
Abstract: The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77–i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized.
A fast divide-and-conquer algorithm for indexing human genome sequences  [PDF]
Woong-Kee Loh,Yang-Sae Moon,Wookey Lee
Computer Science , 2010, DOI: 10.1587/transinf.E94.D.1369
Abstract: Since the release of human genome sequences, one of the most important research issues is about indexing the genome sequences, and the suffix tree is most widely adopted for that purpose. The traditional suffix tree construction algorithms have severe performance degradation due to the memory bottleneck problem. The recent disk-based algorithms also have limited performance improvement due to random disk accesses. Moreover, they do not fully utilize the recent CPUs with multiple cores. In this paper, we propose a fast algorithm based on 'divide-and-conquer' strategy for indexing the human genome sequences. Our algorithm almost eliminates random disk accesses by accessing the disk in the unit of contiguous chunks. In addition, our algorithm fully utilizes the multi-core CPUs by dividing the genome sequences into multiple partitions and then assigning each partition to a different core for parallel processing. Experimental results show that our algorithm outperforms the previous fastest DIGEST algorithm by up to 3.5 times.
Evolving Trends in the Hepatitis C Virus Molecular Epidemiology Studies: From the Viral Sequences to the Human Genome  [PDF]
Julieta Trinks,Adrián Gadano,Pablo Argibay
Epidemiology Research International , 2012, DOI: 10.1155/2012/856810
Abstract: Hepatitis C virus (HCV) represents a major worldwide public health problem. The search for the key molecular biomarkers that may provide insight on the basis of the differences in disease progression, severity, and response to therapy is crucial for understanding the natural history of HCV, for estimating the burden of infection and for developing preventive interventions. Initially, molecular epidemiology studies have focused on studying the viral genetic diversity (genotypes, genetic variants, specific nucleotide and amino acid substitutions). However, the clinical heterogeneities of HCV infection and the imperfect predictability of the response to treatment have suggested the need to search for host genetic biomarkers. This led to the discovery of genetic polymorphisms playing a major role in the evolution of infection, as well as in treatment response and adverse effects, such as IL-28B, ITPA, and IP-10. As a consequence, nowadays the focus of molecular epidemiology studies has turned from the viral to the human genome. This paper will cover recent reports on the subject describing the most relevant viral as well as host genetic risk factors analyzed by past and current HCV molecular epidemiology studies. 1. Introduction HCV represents a major health problem with approximately 3% of the world population—that is, more than 170 million people—infected. While only 20–30% of individuals exposed to HCV recover spontaneously, the remaining 70–80% develop chronic HCV infection (CHC) [1]. Moreover, 3–11% of those people will develop liver cirrhosis (LC) within 20 years [2], with associated risks of liver failure and hepatocellular carcinoma (HCC) [3] which are the leading indications of liver transplantation in industrialized countries [4]. The socioeconomic impact of HCV infection is therefore tremendous and the burden of the disease is expected to increase around the world as the disease progresses in patients who contracted HCV years ago. Since the discovery of HCV more than 20 years ago [5], epidemiological studies have described complex patterns of infection concerning not only the worldwide prevalence of this virus but also its clinical presentation and its therapeutic response. HCV presents highly variable local prevalence rates between countries and within countries [6]; for example, in Argentina the overall prevalence of HCV infection is close to 2%, but higher rates have been reported in different small rural communities (5.7–4.9%) [7, 8]. The outcome of HCV infection is—as previously stated—heterogeneous ranging from an asymptomatic self-limiting
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences
Peter Meinicke
BMC Genomics , 2009, DOI: 10.1186/1471-2164-10-409
Abstract: Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time.For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.The assignment of genes to certain functional categories is a central task in genome annotation. The distribution of assignments, i.e. the functional profile, provides a highly informative summary of a genome. Functional profiling plays a key role in comparative genomics for studying aspects of systems biology on a genome wide scale [1]. Without the restriction of DNA sequencing to culturable organisms, metagenomics allows to study the genomic potential of whole microbial communities. Functional profiling of metagenomes is an essential tool for comparative analysis of microbial ecosystems [2]. In the context of functional genomics, gene clusters and protein domains are widely used for homology-based annotation. Both appr
TWO SUBJECTS IN THE FIELD OF GENOME RESEARCH——NONCODING SEQUENCE AND BIOLOGICAL NETWORK
与生物信息学相关的两个前沿方向——非编码基因和复杂生物网络

CHEN Run-sheng,
陈润生

生物物理学报 , 2007,
Abstract: This article summarized two quickly, one is about noncoding sequences, about bio-networks and systems biology bioinformatics was also discussed. subjects in the field of genome research which is progressing especially noncoding RNAs and their genes, and another is The impact of the progressing in these two subjects upon bioinformatics was also discussed.
Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness
Ana C Marques, Chris P Ponting
Genome Biology , 2009, DOI: 10.1186/gb-2009-10-11-r124
Abstract: Our analyses reveal lincRNA and macroRNA exon sequences to be subject to the same relatively low degree of sequence constraint. Nonetheless, our observations are consistent with the functionality of a fraction of ncRNA in these sets, with up to a quarter of ncRNA exons having evolved significantly slower than neighboring neutral sequence. The more tissue-specific macroRNAs are enriched in predicted RNA secondary structures and thus may often act in trans, whereas the more highly and broadly expressed lincRNAs appear more likely to act in the cis-regulation of adjacent transcription factor genes.Taken together, our results indicate that each of the two ncRNA catalogues unevenly and lightly samples the true, much larger, ncRNA repertoire of the mouse.The eukaryotic transcriptome now appears far more complex and extensive than previously anticipated. Transcription units are frequently interleaved [1] and transcripts are produced from both coding and noncoding stretches of the genome, including intergenic, intronic and promoter regions [2,3], resulting in a vast array of RNA molecules varying in size, abundance and protein-coding potential. For example, of the 10% of human euchromatic nucleotides that appear to be stably transcribed, more than half lie outside protein-coding gene annotations [2]. Widespread non-protein-coding RNA transcription is evident in many eukaryotic genomes, including mouse, fruitfly and plants [4].Despite the ever increasing number of long (>200 nucleotide) noncoding RNA (ncRNA) transcripts being identified, the functions of most remain to be determined. Indeed, their biological significance remains controversial [5]. Arguing in favor of their biological relevance are observations that ncRNAs often show variable, perhaps regulated, spatiotemporal expression patterns [6,7], and that their sequences are better conserved with respect to substitutions, insertions and deletions than are putative neutrally evolving sequences [8]. Many long ncRNAs whos
Fast evolving 18S rRNA sequences from Solenogastres (Mollusca) resist standard PCR amplification and give new insights into mollusk substitution rate heterogeneity
Achim Meyer, Christiane Todt, Nina T Mikkelsen, Bernhard Lieb
BMC Evolutionary Biology , 2010, DOI: 10.1186/1471-2148-10-70
Abstract: We report here the first authentic 18S genes of three Solenogastres species (Mollusca), each possessing a unique sequence composition with regions conspicuously rich in guanine and cytosine. For these GC-rich regions we calculated strong secondary structures. The observed high intra-molecular forces hamper standard amplification and appear to increase formation of chimerical sequences caused by contaminating foreign DNAs from potential prey organisms. In our analyses, contamination was avoided by using RNA as a template. Indication for contamination of previously published Solenogastres sequences is presented. Detailed phylogenetic analyses were conducted using RNA specific models that account for compensatory substitutions in stem regions.The extreme morphological diversity of mollusks is mirrored in the molecular 18S data and shows elevated substitution rates mainly in three higher taxa: true limpets (Patellogastropoda), Cephalopoda and Solenogastres. Our phylogenetic tree based on 123 species, including representatives of all mollusk classes, shows limited resolution at the class level but illustrates the pitfalls of artificial groupings formed due to shared biased sequence composition.The small subunit (SSU) 18S rRNA gene is one of the most frequently used genes in phylogenetic studies (see below) and an important marker for random target PCR in environmental biodiversity screening [1]. In general, rRNA gene sequences are easy to access due to highly conserved flanking regions allowing for the use of universal primers [2]. Their repetitive arrangement within the genome provides excessive amounts of template DNA for PCR, even in smallest organisms. The 18S gene is part of the ribosomal functional core and is exposed to similar selective forces in all living beings [3]. Thus, when the first large-scale phylogenetic studies based on 18S sequences were published - first and foremost Field et al.'s [4] phylogeny of the animal kingdom - the gene was celebrated as the
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.