Nucleosomes, which consist of DNA wrapped around histone octamers, are dynamic, and their structure, including their location, size, and occupancy, can be transformed. Nucleosomes can regulate gene expression by controlling the DNA accessibility of proteins. Using next-generation sequencing techniques along with such laboratory methods as micrococcal nuclease digestion, predicting the genomic locations of nucleosomes is possible. However, the true locations of nucleosomes are unknown, and it is difficult to determine their exact locations using next-generation sequencing data. This paper proposes a novel voting algorithm, NucVoter, for the reliable prediction of nucleosome locations. Multiple models verify the consensus areas in which nucleosomes are placed by the model with the highest priority. NucVoter significantly improves the performance of nucleosome prediction. 1. Introduction Genes within DNA are transcribed into an RNA product [1]. To be transcribed, the DNA region encoding a gene must be accessible to proteins such as transcription factors and RNA polymerase [2]. As shown in Figure 1, a nucleosome is composed of a DNA sequence wrapped 1.65 times around a histone octamer [3]. If the DNA region is wrapped compactly to prevent proteins from binding to the DNA, the corresponding gene is not transcribed [4]. Therefore, nucleosomes can regulate gene expression by restricting or facilitating the DNA accessibility of proteins. Figure 1: Organization of nucleosomes and linkers, and DNA. A nucleosome is composed of DNA wrapped around a histone octamer. H indicates a histone octamer. Nucleosomes are connected by linker DNA. DNA is double stranded; the forward strand is in the 5′ to 3′ direction, while the reverse strand is in the opposite direction. Figure 2 shows the profile of typical nucleosomes around the transcription start sites (TSSs) of yeast genes. The most prevalent size of nucleosomes is approximately 147 base pairs (bp), and the length of linker DNA between nucleosomes is approximately 18?bp [3]. The occupancy of a nucleosome represents the possibility that a nucleosome resides at a particular genomic location. The so-called ?1 nucleosome is the first nucleosome upstream of the TSS. The area downstream of the ?1 nucleosome is the nucleosome-free region (NFR) which shows very low nucleosome occupancies over approximately 150?bp on average [5]. The NFR contains transcription factor binding sites and is therefore important in transcription regulation [6]. The first nucleosome downstream of the NFR is the +1 nucleosome, followed by the +2, +3,
References
[1]
J. C. Rajapakse and S. L. Ho, “Markov/neural model for Eukaryotic promoter recognition,” in Machine Learning in Bioinformatics, Y. Zhang and J. C. Rajapakse, Eds., pp. 283–299, John Wiley & Sons, New York, NY, USA, 2009.
[2]
S. Draghici, Statistics and Data Analysis for Microarrays Using R and Bioconductor, CRC Press, New York, NY, USA, 2012.
[3]
C. Jiang and B. F. Pugh, “Nucleosome positioning and gene regulation: advances through genomics,” Nature Reviews Genetics, vol. 10, no. 3, pp. 161–172, 2009.
[4]
F. Zambelli and G. Pavesi, “Algorithmic issues in the analysis of Chip-seq data,” in Algorithms in Computational Molecular Biology, M. Elloumi and A. Y. Zomaya, Eds., pp. 425–448, John Wiley & Sons, New York, NY, USA, 2011.
[5]
L. Bai and A. V. Morozov, “Gene regulation by nucleosome positioning,” Trends in Genetics, vol. 26, no. 11, pp. 476–483, 2010.
[6]
O. Bell, V. K. Tiwari, N. H. Thom?, and D. Schübeler, “Determinants and dynamics of genome accessibility,” Nature Reviews Genetics, vol. 12, no. 8, pp. 554–564, 2011.
[7]
G. Felsenfeld and M. Groudine, “Controlling the double helix,” Nature, vol. 421, no. 6921, pp. 448–453, 2003.
[8]
K. Cui and K. Zhao, “Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-Seq,” Methods in Molecular Biology, vol. 833, pp. 413–419, 2012.
[9]
W. J. Ansorge, “Next-generation DNA sequencing techniques,” New Biotechnology, vol. 25, no. 4, pp. 195–203, 2009.
[10]
X. Zhou, L. Ren, Q. Meng, Y. Li, Y. Yu, and J. Yu, “The next-generation sequencing technology and application,” Protein and Cell, vol. 1, no. 6, pp. 520–536, 2010.
[11]
M. L. Eaton, K. Galani, S. Kang, S. P. Bell, and D. M. MacAlpine, “Conserved nucleosome positioning defines replication origins,” Genes and Development, vol. 24, no. 8, pp. 748–753, 2010.
[12]
R. M. Fraser, D. Keszenman-Pereyra, M. W. Simmen, and J. Allan, “High-resolution mapping of sequence-directed nucleosome positioning on genomic DNA,” Journal of Molecular Biology, vol. 390, no. 2, pp. 292–305, 2009.
[13]
N. Ponts, E. Y. Harris, J. Prudhomme et al., “Nucleosome landscape and control of transcription in the human malaria parasite,” Genome Research, vol. 20, no. 2, pp. 228–238, 2010.
[14]
S. Shivaswamy, A. Bhinge, Y. Zhao, S. Jones, M. Hirst, and V. R. Iyer, “Dynamic remodeling of individual nucleosomes across a eukaryotic genome in response to transcriptional perturbation,” PLoS Biology, vol. 6, no. 3, article e65, 2008.
[15]
I. Albert, S. Wachi, C. Jiang, and B. F. Pugh, “GeneTrack—a genomic data processing and visualization framework,” Bioinformatics, vol. 24, no. 10, pp. 1305–1306, 2008.
[16]
A. Nellore, K. Bobkov, E. Howe, A. Pankov, A. Diaz, and J. S. Song, “NSeq: a multithreaded Java application for finding positioned nucleosomes from sequencing data,” Frontiers in Genetics, vol. 3, article 320, 2013.
[17]
A. Weiner, A. Hughes, M. Yassour, O. J. Rando, and N. Friedman, “High-resolution nucleosome mapping reveals transcription-dependent promoter packaging,” Genome Research, vol. 20, no. 1, pp. 90–100, 2010.
[18]
I. H. Witten and E. Frank, Data Mining, Morgan Kaufmann Publishers, 2005.
[19]
O. Flores and M. Orozco, “nucleR: a package for non-parametric nucleosome positioning,” Bioinformatics, vol. 27, no. 15, Article ID btr345, pp. 2149–2150, 2011.
[20]
J. Pevsner, Bioinformatics and Functional Genomics, John Wiley & Sons, New York, NY, USA, 2009.
[21]
D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Research, vol. 37, no. 1, pp. 1–13, 2009.
[22]
M. D. Robinson, J. Grigull, N. Mohammad, and T. R. Hughes, “FunSpec: a web-based cluster interpreter for yeast,” BMC Bioinformatics, vol. 3, article 35, 2002.