All Title Author
Keywords Abstract

PLOS ONE  2014 

Transcription Factor Binding Sites Prediction Based on Modified Nucleosomes

DOI: 10.1371/journal.pone.0089226

Full-Text   Cite this paper   Add to My Lib


In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, “modified nucleosomes neighboring” and “modified nucleosomes occupancy”, to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.


[1]  Ernst J, Plasterer HL, Simon I, Bar-Joseph Z (2010) Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome research 20: 526–536. doi: 10.1101/gr.096305.109
[2]  Won KJ, Ren B, Wang W (2010) Genome-wide prediction of transcription factor binding sites using an integrated model. Genome biology 7 11. doi: 10.1186/gb-2010-11-1-r7
[3]  Cuellar-Partida G, Buske FA, McLeay RC, Whitington T, Noble WS, et al. (2012) Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics 28: 56–62. doi: 10.1093/bioinformatics/btr614
[4]  Holloway DT, Kon M, DeLisi C (2005) Integrating genomic data to predict transcription factor binding. Genome Informatics Series 16: 83.
[5]  L?hdesm?ki H, Rust AG, Shmulevich I (2008) Probabilistic inference of transcription factor binding from multiple data sources. PLoS One 3: 1820. doi: 10.1371/journal.pone.0001820
[6]  Lenhard B, Sandelin A, Mendoza L, Engstr?m P, Jareborg N, et al. (2003) Identification of conserved regulatory elements by comparative genome analysis. Journal of Biology 2: 13. doi: 10.1186/1475-4924-2-13
[7]  Ramsey SA, Knijnenburg TA, Kennedy KA, Zak DE, Gilchrist M, et al. (2010) Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites. Bioinformatics 26: 2071–2075. doi: 10.1093/bioinformatics/btq405
[8]  Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16: 16–23. doi: 10.1093/bioinformatics/16.1.16
[9]  Whitington T, Perkins AC, Bailey TL (2009) High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites. Nucleic acids research 37: 14–25. doi: 10.1093/nar/gkn866
[10]  Won KJ, Agarwal S, Shen L, Shoemaker R, Ren B (2009) An integrated approach to identifying cis-regulatory modules in the human genome. PLoS One 4: 5501. doi: 10.1371/journal.pone.0005501
[11]  Ma W, Wong WH (2011) The analysis of ChIP-Seq data. Methods Enzymol 497: 51–73. doi: 10.1016/b978-0-12-385075-1.00003-2
[12]  Duan J (2010) Computational Analysis of ChIP-Seq Data. PhD diss., AARHUS University.
[13]  Ji H (2010) Computational analysis of ChIP-Seq data. Computational Biology of Transcription Factor Binding: 143–159.
[14]  Roh T, Chi Ngau W, Cui K, Landsman D, Zhao K (2004) High-resolution genome-wide mapping of histone modifications. Nature biotechnology 22: 1013–1016. doi: 10.1038/nbt990
[15]  Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, et al. (2000) Genome-wide location and function of DNA binding proteins. Science 290: 2306–2309. doi: 10.1126/science.290.5500.2306
[16]  Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431: 99–104. doi: 10.1038/nature02800
[17]  Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, et al. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409: 533–538. doi: 10.1038/35054095
[18]  Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, et al. (2005) A high-resolution map of active promoters in the human genome. Nature 436: 876–880. doi: 10.1038/nature03877
[19]  Liu XS, Brutlag DL, Liu JS (2002) An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature biotechnology 20: 835–839. doi: 10.1038/nbt717
[20]  Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, et al. (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature methods 4: 651–657. doi: 10.1038/nmeth1068
[21]  Zhang Y, Shin H, Song J, Lei Y, Liu XS (2008) Identifying positioned nucleosomes with epigenetic marks in human from ChIP-Seq. BMC genomics 9 : 537, 2008.
[22]  Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, et al. (2006) Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature biotechnology 24: 1429–1435. doi: 10.1038/nbt1246
[23]  Bulyk ML, Johnson PLF, Church GM (2002) Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic acids research 30: 1255–1261. doi: 10.1093/nar/30.5.1255
[24]  Zhou Q, Liu JS (2004) Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics 20: 909–916. doi: 10.1093/bioinformatics/bth006
[25]  Barash Y, Elidan G, Friedman N, Kaplan T (2003) Modeling dependencies in protein-DNA binding sites. In Proceedings of the seventh annual international conference on Research in computational molecular biology :28–37.
[26]  Kouzarides T (2007) Chromatin modifications and their function. Cell 128: 693–705. doi: 10.1016/j.cell.2007.02.005
[27]  Kratz A, Amer E, Saito R, Kubosaki A, Kawai J, et al. (2010) Core promoter structure and genomic context reflect histone 3 lysine 9 acetylation patterns. BMC genomics 11: 257. doi: 10.1186/1471-2164-11-257
[28]  Nozaki T, Yachie N, Ogawa R, Kratz A, Saito R, et al. (2011) Tight associations between transcription promoter type and epigenetic variation in histone positioning and modification. BMC genomics 12: 416. doi: 10.1186/1471-2164-12-416
[29]  Cui K, Zang C, Roh TY, Schones DE, Childs RW, et al. (2009) Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell stem cell 4: 80–93. doi: 10.1016/j.stem.2008.11.011
[30]  Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell 129: 823–837. doi: 10.1016/j.cell.2007.05.009
[31]  Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, et al. (2009) Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459: 108–112. doi: 10.1038/nature07829
[32]  Bannister AJ, Schneider R, Myers FA, Thorne AW, Crane-Robinson C, et al. (2005) Spatial distribution of di-and tri-methyl lysine 36 of histone H3 at active genes. Journal of Biological Chemistry 280: 17732–17736. doi: 10.1074/jbc.m500796200
[33]  Boyer LA, Plath K, Zeitlinger J, Brambrink T, Medeiros LA, et al. (2006) Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441: 349–353. doi: 10.1038/nature04733
[34]  Schotta G, Lachner M, Sarma K, Ebert A, Sengupta R, et al. (2004) A silencing pathway to induce H3-K9 and H4-K20 trimethylation at constitutive heterochromatin. Genes & development 18: 1251–1262. doi: 10.1101/gad.300704
[35]  Karli? R, Chung HR, Lasserre J, Vlahovi?ek K, Vingron M (2010) Histone modification levels are predictive for gene expression. Proceedings of the National Academy of Sciences 107: 2926–2931. doi: 10.1073/pnas.0909344107
[36]  Zhang Z, Zhang MQ (2011) Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microRNA genes. BMC bioinformatics 12: 155. doi: 10.1186/1471-2105-12-155
[37]  Suzuki H, Forrest ARR, Nimwegen EV, Daub CO, Balwierz PJ, et al. (2009) the transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nature genetics 41: 553–562.
[38]  Mount DW (2004) Sequence and genome analysis. Bioinformatics: Cold Spring Harbour Laboratory Press: Cold Spring Harbour 2.


comments powered by Disqus