Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Quantifying similarity between motifs
Shobhit Gupta, John A Stamatoyannopoulos, Timothy L Bailey, William Noble
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-2-r24
Abstract: Discovering and characterizing DNA and protein sequence motifs are fundamental problems in computational biology. Here, we use the term 'motif' to refer to a position-specific probability matrix that describes a short sequence of amino acids or nucleotides that is important to the functioning of the cell. For example, the regulation of transcription requires sequence-specific binding of transcription factors to certain cis-acting motifs, which typically are located upstream of transcriptional start sites [1]. On the other hand, protein sequence motifs might correspond to active sites in enzymes or to binding sites in receptors [2].A wide variety of statistical methods have been developed to identify sequence motifs in an unsupervised manner from collections of functionally related sequences [3]. In addition, databases such as JASPAR [4], TRANSFAC [5], and BLOCKS [6] can be used to scan a sequence of interest for known DNA or protein motifs. In this work we develop a statistical method for comparing two DNA or protein motifs with one another. This type of comparison is valuable within the context of motif discovery. For example, imagine that you are given a collection of promoter regions from genes that share similar mRNA expression profiles, and that a motif discovery algorithm identifies a motif within those promoters. Often, the first question you would ask is whether this new motif resembles some previously identified transcription factor binding site motif. To address this question, you need a computer program that will scan a motif database for matches to your new (query) motif. The program must consider all possible relative offsets between the two motifs, and for DNA motifs it must consider reverse complement matches as well. An example alignment between two similar motifs is shown in Figure 1. An alternate use for a motif comparison program would be to identify and then eliminate or merge highly redundant motifs within an existing motif database.We are not t
Transcription Factor Binding Site Positioning in Yeast: Proximal Promoter Motifs Characterize TATA-Less Promoters  [PDF]
Ionas Erb, Erik van Nimwegen
PLOS ONE , 2011, DOI: 10.1371/journal.pone.0024279
Abstract: The availability of sequence specificities for a substantial fraction of yeast's transcription factors and comparative genomic algorithms for binding site prediction has made it possible to comprehensively annotate transcription factor binding sites genome-wide. Here we use such a genome-wide annotation for comprehensively studying promoter architecture in yeast, focusing on the distribution of transcription factor binding sites relative to transcription start sites, and the architecture of TATA and TATA-less promoters. For most transcription factors, binding sites are positioned further upstream and vary over a wider range in TATA promoters than in TATA-less promoters. In contrast, a group of ‘proximal promoter motifs’ (GAT1/GLN3/DAL80, FKH1/2, PBF1/2, RPN4, NDT80, and ROX1) occur preferentially in TATA-less promoters and show a strong preference for binding close to the transcription start site in these promoters. We provide evidence that suggests that pre-initiation complexes are recruited at TATA sites in TATA promoters and at the sites of the other proximal promoter motifs in TATA-less promoters. TATA-less promoters can generally be classified by the proximal promoter motif they contain, with different classes of TATA-less promoters showing different patterns of transcription factor binding site positioning and nucleosome coverage. These observations suggest that different modes of regulation of transcription initiation may be operating in the different promoter classes. In addition we show that, across all promoter classes, there is a close match between nucleosome free regions and regions of highest transcription factor binding site density. This close agreement between transcription factor binding site density and nucleosome depletion suggests a direct and general competition between transcription factors and nucleosomes for binding to promoters.
Comprehensive Human Transcription Factor Binding Site Map for Combinatory Binding Motifs Discovery  [PDF]
Arnoldo J. Müller-Molina, Hans R. Sch?ler, Marcos J. Araúzo-Bravo
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0049086
Abstract: To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%–20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory “DNA words.” From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%—far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of “DNA words,” newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs
Fernando Garcia-Alcalde, Armando Blanco, Adrian J Shepherd
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-551
Abstract: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven.Cells control the abundance of proteins by means of diverse mechanisms. One such mechanism is the regulation of transcription, which is a continuous process whereby many factors combine to ensure appropriate rates of protein synthesis. Understanding such complex processes is one of the main objectives in computational biology. In its early stages, transcription is controlled, among other mechanisms, by the binding of proteins called transcription factors (TFs) to specific regions of a given chromosome called transcription factor binding sites (TFBSs). These interactions between proteins and DNA usually take place upstream from the gene, close to the transcription start site (TSS), in the so-called promoter region of the gene.One of the biggest issues in identifying TFBSs is that a single binding protein can bind to different DNA sequences. Related DNA sequences to which the same TF can bind are grouped together into a TFBS motif. The identification
Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs
Srinivas Veerla, Markus Ringnér, Mattias H?glund
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-145
Abstract: We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 105 gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes.The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.The use of global gene expression profiling is a well established approach to characterize biological states or responses. One of the major goals of these investigations is to identify sets of genes with similar expression patterns that may shed new light on the underlying biological process leading to the observed states. A logical and systematic next step is to reduce the identified gene s
Transcription Factor Map Alignment of Promoter Regions  [PDF]
Enrique Blanco,Xavier Messeguer,Temple F Smith,Roderic Guigó
PLOS Computational Biology , 2006, DOI: 10.1371/journal.pcbi.0020049
Abstract: We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
A Discriminative Approach for Unsupervised Clustering of DNA Sequence Motifs  [PDF]
Philip Stegmaier ,Alexander Kel,Edgar Wingender,Jürgen Borlak
PLOS Computational Biology , 2013, DOI: 10.1371/journal.pcbi.1002958
Abstract: Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.
Variable structure motifs for transcription factor binding sites
John E Reid, Kenneth J Evans, Nigel Dyer, Lorenz Wernisch, Sascha Ott
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-30
Abstract: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.This paper examines the problem of modelling and discovering sequence motifs for transcription factors that exhibit flexible DNA binding preferences.Transcriptional regulation is an important part of regulatory control in eukaryotes. Experimental techniques to determine which transcription factors bind which loci in particular cell types under specific conditions are improving at a rapid rate. However, we are a long way from determining the binding sites of all transcription factors in all conditions. Until we have this experimental data, mathematical models of binding sites will help us predict TFBSs and in turn help us infer regulatory effects. These mode
Identifying combinatorial regulation of transcription factors and binding motifs
Mamoru Kato, Naoya Hata, Nilanjana Banerjee, Bruce Futcher, Michael Q Zhang
Genome Biology , 2004, DOI: 10.1186/gb-2004-5-8-r56
Abstract: Here we use a novel method that integrates chromatin immunoprecipitation (ChIP) data with microarray expression data and with combinatorial TF-motif analysis. We systematically identify combinations of transcription factors and of motifs. The various combinations of TFs involved multiple binding mechanisms. We reconstruct a new combinatorial regulatory map of the yeast cell cycle in which cell-cycle regulation can be drawn as a chain of extended TF modules. We find that the pairwise combination of a TF for an early cell-cycle phase and a TF for a later phase is often used to control gene expression at intermediate times. Thus the number of distinct times of gene expression is greater than the number of transcription factors. We also see that some TF modules control branch points (cell-cycle entry and exit), and in the presence of appropriate signals they can allow progress along alternative pathways.Combining different data sources can increase statistical power as demonstrated by detecting TF interactions and composite TF-binding motifs. The original picture of a chain of simple cell-cycle regulators can be extended to a chain of composite regulatory modules: different modules may share a common TF component in the same pathway or a TF component cross-talking to other pathways.Gene expression is controlled by combinatorial interaction of transcription factors (TFs) and their binding motifs in DNA. Recent advances in genomic technology such as the DNA microarray have allowed systematic investigation of combinatorial control. However, the classic approach in microarray analysis is to cluster gene-expression patterns and to identify individual DNA sequence motifs specific to each expression cluster [1-5]. The limitations of this approach are: it does not directly address combinatorial regulation by transcription factors; it does not identify the relevant transcription factor(s) even if an over-represented motif is found; and, because it uses a limited amount of inform
Negative selection maintains transcription factor binding motifs in human cancer  [PDF]
I. E. Vorontsov,I. V. Kulakovskiy,G. Khimulya,E. N. Lukianova,D. D. Nikolaeva,I. A. Eliseeva,V. J. Makeev
Quantitative Biology , 2015,
Abstract: Somatic mutations in cancer cells affect various genomic elements disrupting important cell functions. In particular, mutations in DNA binding sites recognized by transcription factors can alter regulator binding affinities and expression of target genes. A number of promoter mutations have been linked with an increased risk of cancer, mutations in binding sites of selected transcription factors have been found under positive selection. However, negative selection of mutations in coding regions is elusive and significance of negative selection in non-coding regions remains controversial. Here we present analysis of transcription factors with binding sites co-localized with non-coding variants. To avoid statistical bias we account for mutation signatures of different cancer types. For many transcription factors, including multiple members of FOX, HOX, and NR families, we show that human cancers accumulate fewer mutations than expected by chance that increase or decrease affinity of binding motifs. Such conservation of motifs is even more exhibited in DNase accessible regions. Our data demonstrate negative selection against binding sites alterations and suggest that this selection pressure protects cancer cells from rewiring of regulatory circuits. Further analysis of transcription factors and the respective conserved binding motifs can reveal cell regulatory pathways crucial for the survivability of various human cancers.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.