|
Plant Methods 2011
ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regionsAbstract: We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate.The software ChIP-seq Analysis in R (CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.Genome-wide identification of in vivo protein-bound genomic regions is essential for a full understanding of transcriptional regulation. DNA fragments that are bound by proteins in vivo can be isolated by chromatin-immunoprecipitation (ChIP) and subsequently identified using microarrays (ChIP-chip) or high-throughput sequencing technologies (ChIP-seq). Recent studies [1,2] indicate that the ChIP-seq approach provides higher resolution and statistical power than ChIP-chip. To date, only two methods have been described for the analysis of ChIP-seq experiments in plants, i.e. [3] and the method developed by our group [2,4].The common approach to analyze the millions of short sequence reads obtained in a typical ChIP-seq experiment is to map them to a reference genome using one of several mapping tools available, for example SOAPv2, Bowtie, or BWA [5-7]. Reads that map to multiple locations in the genome, so called 'multireads' [
|