Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Dinucleotide controlled null models for comparative RNA gene prediction
Tanja Gesell, Stefan Washietl
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-248
Abstract: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz webcite.Comparative genome analysis is currently the most widely used strategy to detect and annotate noncoding RNAs (ncRNAs) [1,2]. In the past few years a series of different algorithms have been developed that predict functional ncRNAs on the basis of conserved secondary structure [3-10]. Some of these methods have been used to predict novel ncRNAs on a genome wide scale [7,11-14]. In combination with experimental verification (microarray, RT-PCR, Northern blot) these methods could successfully uncover many examples of novel nc
Target prioritization and strategy selection for active case-finding of pulmonary tuberculosis: A tool to support country-level project planning
Nobuyuki Nishikiori, Catharina Van Weezenbeek
BMC Public Health , 2013, DOI: 10.1186/1471-2458-13-97
Abstract: A simple deterministic model was developed to calculate the number of estimated TB cases diagnosed and the associated costs of diagnosis. The model was designed to compare cost-effectiveness parameters, such as the cost per case detected, for different diagnostic algorithms when they are applied to different risk populations. The model was transformed into a web-based tool that can support national TB programmes and civil society partners in designing ACF activities.According to the model output, tuberculosis active case-finding can be a costly endeavor, depending on the target population and the diagnostic strategy. The analysis suggests the following: (1) Active case-finding activities are cost-effective only if the tuberculosis prevalence among the target population is high. (2) Extensive diagnostic methods (e.g. X-ray screening for the entire group, use of sputum culture or molecular diagnostics) can be applied only to very high-risk groups such as TB contacts, prisoners or people living with human immunodeficiency virus (HIV) infection. (3) Basic diagnostic approaches such as TB symptom screening are always applicable although the diagnostic yield is very limited. The cost-effectiveness parameter was sensitive to local diagnostic costs and the tuberculosis prevalence of target populations.The prioritization of appropriate target populations and careful selection of cost-effective diagnostic strategies are critical prerequisites for rational active case-finding activities. A decision to conduct such activities should be based on the setting-specific cost-effectiveness analysis and programmatic assessment. A web-based tool was developed and is available to support national tuberculosis programmes and partners in the formulation of cost-effective active case-finding activities at the national and subnational levels.
STELLAR: fast and exact local alignments  [cached]
Kehr Birte,Weese David,Reinert Knut
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-s9-s15
Abstract: Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.
How accurately is ncRNA aligned within whole-genome multiple alignments?
Adrienne X Wang, Walter L Ruzzo, Martin Tompa
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-417
Abstract: We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment.MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.In this time when so many genome sequences are reaching completion, alignments of multiple whole genomes are of great value to biologists, since enlightening evolutionary information is encoded in the conservation and variation across species. Multiple alignment on genomic scales is also a great challenge to algorithm designers. Protein-coding regions usually evolve more slowly than noncoding regions, and therefore tend to be easier to align. In contrast, noncoding regions are still challenging to align correctly. Because of this, a number of recent reviews and articles [1-4] have made compelling pleas for methods to assess the accuracy of multiple sequence alignments and to compare the alignments produced by different tools.We use alignments of noncoding RNA (ncRNA) as a test of the accuracy of multiple alignment of genomic regions that are difficult to align. This is a rather challenging test, as many functional RNAs exhibit weak primary sequence conservation [5]. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment. We return to this topic in the Discussion section.Multiple sequence alignment is a difficult computational problem. Technically, the problem of finding an optimal multiple sequenc
Progressive multiple sequence alignments from triplets
Matthias Kruspe, Peter F Stadler
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-254
Abstract: Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures.The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores.(The software is freely available for download from reference [1])High quality multiple sequence alignments (MSAs) are a prerequisite for many applications in bioinformatics, from the reconstruction of phylogenies and the assessment of evolutionary rate variations to gene finding and phylogenetic footprinting. A large part of comparative genomics thus hinges on our ability to construct accurate MSAs. Since the multiple sequence alignment problem is NP hard [2] with the computational cost growing exponentially with the number of sequences, it has been a long-standing challenge to devise approximation algorithms that are both efficient and accurate. These approaches can be classified into progressive, iterative, and stochastic alignment algorithms. The most widely used tools such as
Comparative Studies of Vertebrate Beta Integrin Genes and Proteins: Ancient Genes in Vertebrate Evolution  [PDF]
Roger S. Holmes,Ujjwal K. Rout
Biomolecules , 2011, DOI: 10.3390/biom1010003
Abstract: Intregins are heterodimeric α- and β-subunit containing membrane receptor proteins which serve various cell adhesion roles in tissue repair, hemostasis, immune response, embryogenesis and metastasis. At least 18 α- (ITA or ITGA) and 8 β-integrin subunits (ITB or ITGB) are encoded on mammalian genomes. Comparative ITB amino acid sequences and protein structures and ITB gene locations were examined using data from several vertebrate genome projects. Vertebrate ITB genes usually contained 13–16 coding exons and encoded protein subunits with ~800 amino acids, whereas vertebrate ITB4 genes contained 36-39 coding exons and encoded larger proteins with ~1800 amino acids. The ITB sequences exhibited several conserved domains including signal peptide, extracellular β-integrin, β-tail domain and integrin β-cytoplasmic domains. Sequence alignments of the integrin β-cytoplasmic domains revealed highly conserved regions possibly for performing essential functions and its maintenance during vertebrate evolution. With the exception of the human ITB8 sequence, the other ITB sequences shared a predicted 19 residue α-helix for this region. Potential sites for regulating human ITB gene expression were identified which included CpG islands, transcription factor binding sites and microRNA binding sites within the 3’-UTR of human ITB genes. Phylogenetic analyses examined the relationships of vertebrate beta-integrin genes which were consistent with four major groups: 1: ITB1, ITB2, ITB7; 2: ITB3, ITB5, ITB6; 3: ITB4; and 4: ITB8 and a common evolutionary origin from an ancestral gene, prior to the appearance of fish during vertebrate evolution. The phylogenetic analyses revealed that ITB4 is the most likely primordial form of the vertebrate β integrin subunit encoding genes, that is the only β subunit expressed as a constituent of the sole integrin receptor ‘α6β4’ in the hemidesmosomes of unicellular organisms.
KISSa: a strategy to build multiple sequence alignments from pairwise comparisons of very closely related sequences
Francesco Marass, Chris Upton
BMC Research Notes , 2009, DOI: 10.1186/1756-0500-2-91
Abstract: We present a simple strategy to enable the creation of large quasi-multiple sequence alignments from pairwise alignment data. This process is suitable for large, closely related sequences such as the polyproteins of dengue viruses, which need the insertion of very few indels.The quasi-multiple sequence alignments generated by KISSa are sufficiently accurate to support tree-based genome selection for interactive bioinformatics analysis tools. The speed of this process is critical to providing an interactive experience for the user.There are many reasons for constructing multiple sequence alignments (MSA), which form the backbone of comparative analyses and the starting point of phylogenetic studies. Similarly, there are a variety of algorithms (CLUSTAL [1,2], T-Coffee [3], MUSCLE [4]) and more software tools to allow a researcher to input a series of DNA or protein sequences and obtain an MSA. The output is usually viewed using a graphical user interface (GUI) that may also permit editing of the MSA (Jalview [5,6], Base-By-Base (BBB) [7]). The alignment of large DNA sequences, in the size range of bacterial chromosomes, requires specialized alignment tools and viewers [8,9]. The mandate of the NIH funded Viral Bioinformatics Resource Center (VBRC) includes collection, annotation and storage of the complete genomes for seven virus families, and both the VBRC administrators and researchers frequently create MSAs of genes, proteins and genomes. Recently, the selection of genomes from large data sets, the VBRC current has >1200 dengue virus genomes (4 genotypes), using menu-type lists has become onerous and we are considering graphical tools based on phylogenetic trees to help users through the data selection process. This will require frequent generation of large MSAs and use considerable computation time because, as new genomes are added to the VBRC database, trees need to be regenerated. Although these alignments could be done at off-peak times, or the new sequences a
Genomics and proteomics of vertebrate cholesterol ester lipase (LIPA) and cholesterol 25-hydroxylase (CH25H)
Roger S. Holmes,John L. VandeBerg,Laura A. Cox
3 Biotech , 2011, DOI: 10.1007/s13205-011-0013-9
Abstract: Cholesterol ester lipase (LIPA; EC and cholesterol 25-hydroxylase (CH25H; EC play essential role in cholesterol metabolism in the body by hydrolysing cholesteryl esters and triglycerides within lysosomes (LIPA) and catalysing the formation of 25-hydroxycholesterol from cholesterol (CH25H) which acts to repress cholesterol biosynthesis. Bioinformatic methods were used to predict the amino acid sequences, structures and genomic features of several vertebrate LIPA and CH25H genes and proteins, and to examine the phylogeny of vertebrate LIPA. Amino acid sequence alignments and predicted subunit structures enabled the identification of key sequences previously reported for human LIPA and CH25H and transmembrane structures for vertebrate CH25H sequences. Vertebrate LIPA and CH25H genes were located in tandem on all vertebrate genomes examined and showed several predicted transcription factor binding sites and CpG islands located within the 5′ regions of the human genes. Vertebrate LIPA genes contained nine coding exons, while all vertebrate CH25H genes were without introns. Phylogenetic analysis demonstrated the distinct nature of the vertebrate LIPA gene and protein family in comparison with other vertebrate acid lipases and has apparently evolved from an ancestral LIPA gene which predated the appearance of vertebrates.
Bioinformatic studies of vertebrate enolases: multifunctional genes and proteins
Roger S Holmes
Open Access Bioinformatics , 2011, DOI: http://dx.doi.org/10.2147/OAB.S16416
Abstract: ioinformatic studies of vertebrate enolases: multifunctional genes and proteins Original Research (2688) Total Article Views Authors: Roger S Holmes Published Date February 2011 Volume 2011:3 Pages 43 - 59 DOI: http://dx.doi.org/10.2147/OAB.S16416 Roger S Holmes School of Biomolecular and Physical Sciences, Griffith University, Nathan, QLD, Australia Abstract: Enolase (ENO) genes and proteins (ENO; EC serve multiple functions in the body, including catalyzing 2-phospho-d-glycerate hydro-lyase activity in glycolysis, assisting hypoxia tolerance, tumor suppression, plasminogen and DNA binding, and acting as a lens crystallin. Comparative ENO amino acid sequences and structures and ENO gene locations were examined using data from several vertebrate genome projects. Vertebrate ENO1, ENO2, and ENO3 genes usually contained 11 coding exons, while ENO4 (encoding an ENO-like protein, ENOLL) usually contained 14 coding exons. Vertebrate ENOF1 (or ENO5) genes encode an antisense RNA, which may regulate mitochondrial thymidylate synthase activity that contained 12–15 coding exons. Vertebrate ENO1, ENO2, and ENO3 sequences shared 78%–98% identities but only 19%–24% with ENO4 and >10% predicted sequence identities with vertebrate ENOF1. Sequence alignments, key amino acid residues, and conserved predicted secondary and tertiary structures were examined, including active site residues (absent in ENO4 and ENOF1) and sites for Mg2+ and plasminogen binding and for acetylation and phosphorylation. The predicted ENO4 structure contained three N-terminal a-helices, two -sheets, a poly-proline segment, and an extended C-terminal sequence in addition to the typical a/ barrel structure reported for ENO1–3 sequences. Potential transcription factor binding sites (TFBS) and CpG islands for regulating ENO gene expression were identified. Human ENO1, ENO2, ENO3, and ENOF1 genes each contained CpG islands in the gene promoter regions consistent with higher-than-average levels of expression. Human ENO3 and ENO1 gene promoters also contained a diverse range of TFBS. The ENO4 gene promoter comprised a CpG island and several TFBS, including AHR1 in the 5'-UTR region, which may suggest a role for ENO4 in aryl hydrocarbon ligand binding or metabolism. Phylogeny studies of vertebrate ENO1, ENO2, and ENO3 genes and enzymes suggested that they originated in a vertebrate ancestor from gene duplication events of an ancestral ENO1-like gene >500 million years ago.
Comparative Studies of Vertebrate Platelet Glycoprotein 4 (CD36)  [PDF]
Roger S. Holmes
Biomolecules , 2012, DOI: 10.3390/biom2030389
Abstract: Platelet glycoprotein 4 (CD36) (or fatty acyl translocase [FAT], or scavenger receptor class B, member 3 [SCARB3]) is an essential cell surface and skeletal muscle outer mitochondrial membrane glycoprotein involved in multiple functions in the body. CD36 serves as a ligand receptor of thrombospondin, long chain fatty acids, oxidized low density lipoproteins (LDLs) and malaria-infected erythrocytes. CD36 also influences various diseases, including angiogenesis, thrombosis, atherosclerosis, malaria, diabetes, steatosis, dementia and obesity. Genetic deficiency of this protein results in significant changes in fatty acid and oxidized lipid uptake. Comparative CD36 amino acid sequences and structures and CD36 gene locations were examined using data from several vertebrate genome projects. Vertebrate CD36 sequences shared 53–100% identity as compared with 29–32% sequence identities with other CD36-like superfamily members, SCARB1 and SCARB2. At least eight vertebrate CD36 N-glycosylation sites were conserved which are required for membrane integration. Sequence alignments, key amino acid residues and predicted secondary structures were also studied. Three CD36 domains were identified including cytoplasmic, transmembrane and exoplasmic sequences. Conserved sequences included N- and C-terminal transmembrane glycines; and exoplasmic cysteine disulphide residues; TSP-1 and PE binding sites, Thr92 and His242, respectively; 17 conserved proline and 14 glycine residues, which may participate in forming CD36 ‘short loops’; and basic amino acid residues, and may contribute to fatty acid and thrombospondin binding. Vertebrate CD36 genes usually contained 12 coding exons. The human CD36 gene contained transcription factor binding sites (including PPARG and PPARA) contributing to a high gene expression level (6.6 times average). Phylogenetic analyses examined the relationships and potential evolutionary origins of the vertebrate CD36 gene with vertebrate SCARB1 and SCARB2 genes. These suggested that CD36 originated in an ancestral genome and was subsequently duplicated to form three vertebrate CD36 gene family members, SCARB1, SCARB2 and CD36.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.