oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints
Robin D Dowell, Sean R Eddy
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-400
Abstract: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment.Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses.RNA secondary structure can be predicted accurately from sequence data alone. For example, the predicted secondary structure of ribosomal RNA has been essentially confirmed by recent crystal structures; 97–98% of the predicted base pairs are confirmed by experimental structures [1]. The trouble is that rRNA predictions were refined by experts over twenty years, ultimately utilizing data from about 7000 small subunit rRNA sequences and 1050 large subunit rRNA sequences [1]. As there are many RNA structures of biological interest [2,3], it is important to find computational means of accelerating, automating, and improving RNA secondary structure prediction [4].There are two main sources of information for RNA secondary structure prediction. The most accurate means of prediction is comparative analysis [5-7], which uses evolutionary information. Homologous RNAs tend to conserve a common base-paired secondary structure. Important base pairing interactions are conserved by compensatory mutations and compensatory mutations induce detectable pairwise sequence correlations between posit
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign
Arif Harmanci, Gaurav Sharma, David H Mathews
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-130
Abstract: The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources.Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download.With the widespread availability of data sets of genome and protein sequences, methods for analyzing the sequences to extract biologically salient information have emerged as powerful techniques in computational bioinformatics [1]. In this arena, comparative sequence analysis has proven extremely powerful, whereby sequence segments across different genomes are examined for similarities. Segments identified as similar represent evolutionarily conserved homologs and are deemed to be biologically significant due to their apparent preservation across the genomes. The postulated significance can then be tested with experiments, which can also help establish functional correlates. Because the biological experiments are time-consuming and expensive, the comparative analysis serves to improve efficiency by "pre-filtering" the relatively large genome to determine relatively small
Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?
Amelia B Bellamy-Royds, Marcel Turcotte
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-190
Abstract: The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid (RNA) sequences, in which the property being aligned is not the primary structure defined by the identity of the nucleotides, but the secondary structure created from base pair interactions. In RNA molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure is more consistently conserved than the sequence.Although there exist dynamic-programming methods for predicting the secondary structure of a single sequence, the quality of the prediction can be significantly improved by using a
MCALIGN2: Faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution
Jun Wang, Peter D Keightley, Toby Johnson
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-292
Abstract: Here we present an improved pair-hidden-Markov-Model (pair HMM) based method for performing global pairwise alignment of non-coding DNA sequences. The method uses an explicit model of indel length frequency distribution which can be specified, and allows any time reversible model of nucleotide substitution. The method uses a deterministic global optimiser to find the alignment with the highest posterior probability. We test MCALIGN2 in simulations, and compare it to a previous Monte Carlo based method (MCALIGN), to the pair HMM method of Knudsen and Miyamoto, and to a heuristic method (AVID) that performed very well in a previous simulation study. We show that the pair HMM methods have excellent performance for all combinations of parameter values we have considered. MCALIGN2 is up to ten times faster than MCALIGN. MCALIGN2 is more accurate in resolving indels given an accurate explicit model than heuristic methods, but is computationally slower.MCALIGN2 produces better quality alignments by explicitly using biological knowledge about the indel length distribution and time reversible models of nucleotide substitution. As a result, it can outperform other available sequence alignment methods for the cases we have considered to align non-coding DNA sequences.The advent of automated DNA sequencing methods has resulted in an enormous growth in the volume of sequence data deposited in public databases. The increasing availability of genome sequence data for many related organisms offers great opportunities to study gene function and genome evolution, but it also presents new challenges for DNA sequence analysis, especially for non-coding DNA sequences.For much of the past two decades, research in DNA sequence analysis has focused on protein-coding sequences, which account for only a very small proportion of the total genomic content in mammals, most other vertebrates, many invertebrates, and most plants [1]. For example, protein-coding gene sequences comprise as little a
A fast structural multiple alignment method for long RNA sequences
Yasuo Tabei, Hisanori Kiryu, Taishin Kin, Kiyoshi Asai
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-33
Abstract: We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory.The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org webcite.Non-coding RNAs (ncRNAs) are transcribed RNA molecules that do not encode proteins. Their functions often depend on their 3D-structures rather than their primary sequences. The secondary structures of RNA sequences can be identified by various methods, including minimization of the free energy [1-3]. However, it is not always possible to obtain the accurate secondary structures. More reliable predictions of the secondary structures are possible if we have a set of RNA sequences with a common secondary structure. For consensus structure prediction, RNAalifold [4], Pfold [5], and McCaskill-MEA [6] are applicable only to sets of aligned RNA sequences. Multiple alignment tools that consider only sequence similarities, e.g. ClustalW [7], Dialign [8], and T-Coffee [9], however, have limited accuracy for RNA sequences with low similarity.Simultaneous prediction of the common secondary structure and optimal alignment of RNA sequences is computationally quite expensive, even if pseudo-knotted structures are excluded. For example, the strict algorithm of Sankoff [10] requires O(L3N) in time and O(L2N) in memory for N sequences of length L. Its faster variants that restrict the distances of the base pairs in the primary sequences are proposed for pairwise alignments [11-14].Although structural alignment of multiple RNA sequences with reasonable computational co
Mediator regulates non-coding RNA transcription at fission yeast centromeres
Michael Thorsen, Heidi Hansen, Michela Venturi, Steen Holmberg, Genevieve Thon
Epigenetics & Chromatin , 2012, DOI: 10.1186/1756-8935-5-19
Abstract: We found that the Med8-Med18-Med20 submodule of the Mediator complex is required for the transcriptional regulation of native centromeric dh and dg repeats and for the silencing of reporter genes inserted in centromeric heterochromatin. Mutations in the Med8-Med18-Med20 submodule did not alter Mediator occupancy at centromeres; however, they led to an increased recruitment of RNA polymerase II to centromeres and reduced levels of centromeric H3K9 methylation accounting for the centromeric desilencing. Further, we observed that Med18 and Med20 were required for efficient processing of dh transcripts into siRNA. Consistent with defects in centromeric heterochromatin, cells lacking Med18 or Med20 displayed elevated rates of mitotic chromosome loss.Our data demonstrate a role for the Med8-Med18-Med20 Mediator submodule in the regulation of non-coding RNA transcription at Schizosaccharomyces pombe centromeres. In wild-type cells this submodule limits RNA polymerase II access to the heterochromatic DNA of the centromeres. Additionally, the submodule may act as an assembly platform for the RNAi machinery or regulate the activity of the RNAi pathway. Consequently, Med8-Med18-Med20 is required for silencing of centromeres and proper mitotic chromosome segregation.Mediator is a large (approximately 1 MDa) protein complex that conveys regulatory signals to RNA polymerase II (Pol II). The Saccharomyces cerevisiae Mediator was the first to be characterized but Mediators have since then been described in many other species. A comparative genomics approach of approximately 70 eukaryotic genomes shows that although its exact subunit composition varies, Mediator is conserved across the eukaryotic kingdom [1]. The Schizosaccharomyces pombe Mediator consists of at least 20 subunits, all of which appear to have orthologues in Drosophila melanogaster, Caenorhabditis elegans and Homo sapiens[2].Three distinct domains (head, middle and tail) have been identified by electron microscopy on
Meiotic Recombination Hotspots of Fission Yeast Are Directed to Loci that Express Non-Coding RNA  [PDF]
Wayne P. Wahls, Eric R. Siegel, Mari K. Davidson
PLOS ONE , 2008, DOI: 10.1371/journal.pone.0002887
Abstract: Background Polyadenylated, mRNA-like transcripts with no coding potential are abundant in eukaryotes, but the functions of these long non-coding RNAs (ncRNAs) are enigmatic. In meiosis, Rec12 (Spo11) catalyzes the formation of dsDNA breaks (DSBs) that initiate homologous recombination. Most meiotic recombination is positioned at hotspots, but knowledge of the mechanisms is nebulous. In the fission yeast genome DSBs are located within 194 prominent peaks separated on average by 65-kbp intervals of DNA that are largely free of DSBs. Methodology/Principal Findings We compared the genome-wide distribution of DSB peaks to that of polyadenylated ncRNA molecules of the prl class. DSB peaks map to ncRNA loci that may be situated within ORFs, near the boundaries of ORFs and intergenic regions, or most often within intergenic regions. Unconditional statistical tests revealed that this colocalization is non-random and robust (P≤5.5×10?8). Furthermore, we tested and rejected the hypothesis that the ncRNA loci and DSB peaks localize preferentially, but independently, to a third entity on the chromosomes. Conclusions/Significance Meiotic DSB hotspots are directed to loci that express polyadenylated ncRNAs. This reveals an unexpected, possibly unitary mechanism for what directs meiotic recombination to hotspots. It also reveals a likely biological function for enigmatic ncRNAs. We propose specific mechanisms by which ncRNA molecules, or some aspect of RNA metabolism associated with ncRNA loci, help to position recombination protein complexes at DSB hotspots within chromosomes.
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework
Kazutaka Katoh, Hiroyuki Toh
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-212
Abstract: We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage.The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.Multiple alignment is an important step in various phases of comparative studies of RNAs, such as the detection of common secondary structures from a set of homologous sequences and the preparation of an alignment as a query for database search tools including Infernal [1]. Since the discovery of functional non-coding RNAs (ncRNAs), the necessity for the incorporation of secondary structural information into a multiple RNA alignment has been recognized, and many efforts are being made toward this goal [2-14]. Secondary structure prediction and multiple RNA alignment are closely related to each other. According to Gardner and Giegerich [15], there are three possible plans to infer common secondary structures from a set of unaligned RNA sequences, align-then-fold
Noncoding RNA gene detection using comparative sequence analysis
Elena Rivas, Sean R Eddy
BMC Bioinformatics , 2001, DOI: 10.1186/1471-2105-2-8
Abstract: We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class.We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.Some genes produce functional noncoding RNAs (ncRNAs) instead of coding for proteins [1,2]. For protein-coding genes, we have computational genefinding tools [3] that predict novel genes in genome sequence data with reasonable efficiency [4]. For ncRNA genes, there are as yet no general genefinding algorithms. The number and diversity of ncRNA genes remains poorly understood, despite the availability of many complete genome sequences. Gene discovery methods (whether experimental or computational) typically assume that the target is a protein coding gene that produces a messenger RNA.New noncoding RNA genes continue to be discovered by less systematic means, which makes it seem likely that a systematic RNA genefinding algorithm would be of use. Recent discoveries have inclu
Pairwise alignment incorporating dipeptide covariation  [PDF]
Gavin E. Crooks,Richard E. Green,Steven E. Brenner
Quantitative Biology , 2005, DOI: 10.1093/bioinformatics/bti616
Abstract: Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.