oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2020 ( 65 )

2019 ( 673 )

2018 ( 724 )

2017 ( 719 )

Custom range...

Search Results: 1 - 10 of 407526 matches for " Alan M Moses "
All listed articles are free for downloading (OA Articles)
Page 1 /407526
Display every page Item
Statistical tests for natural selection on regulatory regions based on the strength of transcription factor binding sites
Alan M Moses
BMC Evolutionary Biology , 2009, DOI: 10.1186/1471-2148-9-286
Abstract: Here, tests for natural selection on regulatory regions are proposed based on nucleotide substitutions that occur in characterized transcription factor binding sites (an important type functional element within regulatory regions). In the absence of selection, these substitutions will tend to reduce the strength of existing binding sites. On the other hand, purifying selection will act to preserve the binding sites in regulatory regions, while positive selection can act to create or destroy binding sites, as well as change their strength. Using standard models of binding site strength and molecular evolution in the absence of selection, this intuition can be used to develop statistical tests for natural selection. Application of these tests to two well-characterized regulatory regions in Drosophila provides evidence for purifying selection.This demonstrates that it is possible to develop tests for selection on regulatory regions based on the specific functional constrains on these sequences.The importance of cis-regulatory regions in the evolution of complex organisms is increasingly appreciated (reviewed in [1] and [2]), and general understanding of the molecular evolution of these sequences has grown rapidly [3-13]. An important outstanding question is whether natural selection has driven evolutionary changes in cis-regulatory regions, or whether these result from non-adaptive processes [14].Many tests for natural selection can be applied to non-coding DNA and several important studies have identified signatures of natural selection in well-characterized regulatory regions (reviewed in [15]). Tests for selection on differences between species often compare the ratio of substitutions in transcription factor binding sites (an important class of functional element within cis-regulatory regions) to the surrounding non-coding DNA [16]. These tests are modelled after tests on coding regions that compare the patterns of amino acid changing differences to synonymous diffe
Ranking insertion, deletion and nonsense mutations based on their effect on genetic information
Amin Zia, Alan M Moses
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-299
Abstract: We propose computational methods to rank insertion-deletion mutations in the coding as well as non-coding regions and nonsense mutations. We rank these variations by measuring the extent of their effect on biological function, based on the assumption that evolutionary conservation reflects function. Using sequence data from budding yeast and human, we show that variations which that we predict to have larger effects segregate at significantly lower allele frequencies, and occur less frequently than expected by chance, indicating stronger purifying selection. Furthermore, we find that insertions, deletions and premature stop codons associated with disease in the human have significantly larger predicted effects than those not associated with disease. Interestingly, the large-effect mutations associated with disease show a similar distribution of predicted effects to that expected for completely random mutations.This demonstrates that the evolutionary conservation context of the sequences that harbour insertions, deletions and nonsense mutations can be used to predict and rank the effects of the mutations.Genetic variations contribute to normal phenotypic variation [1]. For human, it is estimated that there are more than 10 million SNPs (i.e. 1 in 300 base pairs on average) with an observed minor allele frequency of ≥ 1% in the population [2]. Recent advances in sequencing technologies [3] have enabled rapid discovery of other types of variations, including mutations expected to have very large effects on protein function such as frame shifting insertions and deletions (indels) and nonsense mutations (mutations that introduce premature stop codons). Amazingly, insertions and deletions are also abundant in the human genome with sizes ranging from single to several million base pairs (bp) [4,5]. For example, in 179 human genomes there were 1.13 million short indels identified [6] indicating an estimate of 1 million indels per human genome (1 in 3600 bps on average). Sim
Towards a theoretical understanding of false positives in DNA motif finding
Amin Zia, Alan M Moses
BMC Bioinformatics , 2012, DOI: 10.1186/1471-2105-13-151
Abstract: Using large-deviations theory, we derive a remarkably simple relationship that describes the dependence of false positives on dataset size for the one-occurrence per sequence motif-finding problem. As expected, we predict that false-positives can be reduced by decreasing the sequence length or by adding more sequences to the dataset. Interestingly, we find that the false-positive strength depends more strongly on the number of sequences in the dataset than it does on the sequence length, but that the dependence on the number of sequences diminishes, after which adding more sequences does not reduce the false-positive rate significantly. We compare our theoretical predictions by applying four popular motif-finding algorithms that solve the one-occurrence-per-sequence problem (MEME, the Gibbs Sampler, Weeder, and GIMSAN) to simulated data that contain no motifs. We find that the dependence of false positives detected by these softwares on the motif-finding parameters is similar to that predicted by our formula.We quantify the relationship between the sequence search space and motif-finding false-positives. Based on the simple formula we derive, we provide a number of intuitive rules of thumb that may be used to enhance motif-finding results in practice. Our results provide a theoretical advance in an important problem in computational biology.
Modeling the evolution of a classic genetic switch
Christos Josephides, Alan M Moses
BMC Systems Biology , 2011, DOI: 10.1186/1752-0509-5-24
Abstract: We develop a modeling framework to examine the evolution of the GAL regulatory network. This enables us to translate molecular changes in the regulatory network to changes in quantitative network function. We computationally reconstruct an inferred ancestral version of the network and trace the evolutionary paths in the lineage leading to S. cerevisiae. We explore the evolutionary landscape of possible regulatory networks and find that the operation of intermediate networks leading to S. cerevisiae differs substantially depending on the order in which evolutionary changes accumulate; in particular, we systematically explore evolutionary paths and find that some network features cannot be optimized simultaneously.We find that a computational modeling approach can be used to analyze the evolution of a well-studied regulatory network. Our results are consistent with several experimental studies of the evolutionary of the GAL regulatory network, including increased fitness in Saccharomyces due to duplication and adaptive regulatory divergence. The conceptual and computational tools that we have developed may be applicable in further studies of regulatory network evolution.Regulatory networks are known to underlie many biological processes, and therefore their characterization and analysis forms a central focus of systems biology [1-4]. Despite their importance, relatively little is known about how regulatory networks are formed during evolution and shaped by natural selection.One of the best studied regulatory networks in molecular biology is the "GAL network", which is responsible for the inducible metabolism of galactose in budding yeast. In addition to being extremely well-characterized in S. cerevisiae [5-7] it has also been the subject of a number of quantitative modeling efforts [8-11] and evolutionary studies, which have revealed many interesting patterns of regulatory network evolution [12-15]. Perhaps most general of these evolutionary paradigms is the duplicat
Towards a theoretical understanding of false positives in DNA motif finding
Amin Zia,Alan M. Moses
Quantitative Biology , 2010,
Abstract: Detection of false-positive motifs is one of the main causes of low performance in motif finding methods. It is generally assumed that false-positives are mostly due to algorithmic weakness of motif-finders. Here, however, we derive the theoretical dependence of false positives on dataset size and find that false positives can arise as a result of large dataset size, irrespective of the algorithm used. Interestingly, the false-positive strength depends more on the number of sequences in the dataset than it does on the sequence length. As expected, false-positives can be reduced by decreasing the sequence length or by adding more sequences to the dataset. The dependence on number of sequences, however, diminishes and reaches a plateau after which adding more sequences to the dataset does not reduce the false-positive rate significantly. Based on the theoretical results presented here, we provide a number of intuitive rules of thumb that may be used to enhance motif-finding results in practice.
Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase
Alan M Moses, Jean-Karim Hériché, Richard Durbin
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-2-r23
Abstract: Protein kinases are ubiquitous components of cellular signalling networks [1]. A relatively well understood example is the network that controls progression of the cell cycle, where cyclin-dependent kinases (CDKs) couple with various cyclins over the cell cycle to regulate critical processes [2-4]. Despite their biological and medical importance, relatively few direct, in vivo targets of these kinases have been identified conclusively, because experimental techniques are difficult and time consuming [1,5]. With the availability of databases of protein sequences, computational methods provide an alternative approach [6,7].Kinase substrates often have short, degenerate sequence motifs surrounding the phosphorylated residue [8]. Putative target residues can be predicted by searching for matches to the consensus for a particular kinase. For example, CDK substrates often contain S/T-P-X-R/K where X represents any amino acid, and S/T represents the phosphorylated serine or threonine [9,10]. Because of the low specificity of the CDK consensus, however, databases of protein sequences are expected to contain large numbers of matches by chance. Therefore, many of the matches in protein sequences are likely to be false-positive predictions. Consistent with this, when 553 Saccharomyces cerevisiae proteins with at least one match to the CDK consensus were tested in a high-throughput kinase assay, only 32% (178) were found to be substrates [11]. Furthermore, in some cases characterized CDK substrates are phosphorylated at residues matching only a minimal consensus S/T-P [12]; considering these weak matches would probably lead to even larger numbers of false positives.Characterized CDK targets may be phosphorylated at multiple residues (for instance, see the report by Lees and coworkers [13]). Recent studies of several CDK target proteins in S. cerevisiae have shown that these multiple phosphorylations can regulate stability [12], protein interaction [14,15], or localization [16].
Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting
Daniel A Pollard,Venky N Iyer equal contributor,Alan M Moses equal contributor,Michael B Eisen
PLOS Genetics , 2006, DOI: 10.1371/journal.pgen.0020173
Abstract: The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest that D. erecta and D. yakuba are sister species. Using the whole genomes of each of these species as well as the four other fully sequenced species in the subgenus Sophophora, we set out to investigate the placement of D. erecta and D. yakuba in the D. melanogaster species group and to understand the cause of the past incongruence. Though we find that the phylogeny grouping D. erecta and D. yakuba together is the best supported, we also find widespread incongruence in nucleotide and amino acid substitutions, insertions and deletions, and gene trees. The time inferred to span the two key speciation events is short enough that under the coalescent model, the incongruence could be the result of incomplete lineage sorting. Consistent with the lineage-sorting hypothesis, substitutions supporting the same tree were spatially clustered. Support for the different trees was found to be linked to recombination such that adjacent genes support the same tree most often in regions of low recombination and substitutions supporting the same tree are most enriched roughly on the same scale as linkage disequilibrium, also consistent with lineage sorting. The incongruence was found to be statistically significant and robust to model and species choice. No systematic biases were found. We conclude that phylogenetic incongruence in the D. melanogaster species complex is the result, at least in part, of incomplete lineage sorting. Incomplete lineage sorting will likely cause phylogenetic incongruence in many comparative genomics datasets. Methods to infer the correct species tree, the history of every base in the genome, and comparative methods that control for and/or utilize this information will be valuable advancements for the field of comparative genomics.
Determining Physical Constraints in Transcriptional Initiation Complexes Using DNA Sequence Analysis
Ryan K. Shultzaberger, Derek Y. Chiang, Alan M. Moses, Michael B. Eisen
PLOS ONE , 2007, DOI: 10.1371/journal.pone.0001199
Abstract: Eukaryotic gene expression is often under the control of cooperatively acting transcription factors whose binding is limited by structural constraints. By determining these structural constraints, we can understand the “rules” that define functional cooperativity. Conversely, by understanding the rules of binding, we can infer structural characteristics. We have developed an information theory based method for approximating the physical limitations of cooperative interactions by comparing sequence analysis to microarray expression data. When applied to the coordinated binding of the sulfur amino acid regulatory protein Met4 by Cbf1 and Met31, we were able to create a combinatorial model that can correctly identify Met4 regulated genes. Interestingly, we found that the major determinant of Met4 regulation was the sum of the strength of the Cbf1 and Met31 binding sites and that the energetic costs associated with spacing appeared to be minimal.
Whole-Genome Analysis Reveals That Active Heat Shock Factor Binding Sites Are Mostly Associated with Non-Heat Shock Genes in Drosophila melanogaster
Sarah E. Gonsalves,Alan M. Moses,Zak Razak,Francois Robert,J. Timothy Westwood
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0015934
Abstract: During heat shock (HS) and other stresses, HS gene transcription in eukaryotes is up-regulated by the transcription factor heat shock factor (HSF). While the identities of the major HS genes have been known for more than 30 years, it has been suspected that HSF binds to numerous other genes and potentially regulates their transcription. In this study, we have used a chromatin immunoprecipitation and microarray (ChIP-chip) approach to identify 434 regions in the Drosophila genome that are bound by HSF. We have also performed a transcript analysis of heat shocked Kc167 cells and third instar larvae and compared them to HSF binding sites. The heat-induced transcription profiles were quite different between cells and larvae and surprisingly only about 10% of the genes associated with HSF binding sites show changed transcription. There were also genes that showed changes in transcript levels that did not appear to correlate with HSF binding sites. Analysis of the locations of the HSF binding sites revealed that 57% were contained within genes with approximately 2/3rds of these sites being in introns. We also found that the insulator protein, BEAF, has enriched binding prior to HS to promoters of genes that are bound by HSF upon HS but that are not transcriptionally induced during HS. When the genes associated with HSF binding sites in promoters were analyzed for gene ontology terms, categories such as stress response and transferase activity were enriched whereas analysis of genes having HSF binding sites in introns identified those categories plus ones related to developmental processes and reproduction. These results suggest that Drosophila HSF may be regulating many genes besides the known HS genes and that some of these genes may be regulated during non-stress conditions.
NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction
Alex N Nguyen Ba, Anastassia Pogoutse, Nicholas Provart, Alan M Moses
BMC Bioinformatics , 2009, DOI: 10.1186/1471-2105-10-202
Abstract: In this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters.Our implementation of this model, NLStradamus, is made available at: http://www.moseslab.csb.utoronto.ca/NLStradamus/ webciteEukaryotic cells are defined by the presence of their nucleus. The nuclear membrane enclosing the genetic material of the cell is selective in its import of material through its nuclear pores and this translocation is mediated by cellular mechanisms [1,2].Proteins entering the nucleus must do so through proteins forming the nuclear pores: the nuclear pore complex [3,4]. The pores allow the passive diffusion of small proteins, but bigger proteins entering the nucleus are usually bound by karyopherin complexes on their nuclear localization signal [5]. Although there are many nuclear import pathways in eukaryotic cells, most of these have not been characterized in detail. The best understood is the classical NLS pathway. The recognition of classical NLSs on nuclear proteins is done by the importin-α subunit which in turn is recognized by the importin-β subunit. This trimer (cargo, importin-α and importin-β) is then imported to the nucleus after series of enzymatic steps [1,6]. Other families of NLSs are independent of importin-α, and may bind directly to one of the members of the importin-β superfamily [1].Classical NLSs show characteristic patterns of basic residues loosely matching two consensus sequences, K(K/R)X(K/R) and KRX10–12KRXK, termed the 'monopartite' and 'bip
Page 1 /407526
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.