%0 Journal Article
%T A short survey of computational analysis methods in analysing ChIP-seq data
%A Hyunmin Kim
%A Jihye Kim
%A Heather Selby
%A Dexiang Gao
%A Tiejun Tong
%A Tzu Lip Phang
%A Aik Choon Tan
%J Human Genomics
%D 2011
%I BioMed Central
%R 10.1186/1479-7364-5-2-117
%X The regulation of gene expression is tightly controlled by transcription factors (TFs) that bind to specific DNA regulatory regions, histone modifications and positioned nucleosomes in the genome. High-throughput chromatin immunoprecipitation (ChIP) followed by massively parallel nextgeneration sequencing (ChIP-seq) represents a current approach in profiling genome-wide protein -DNA interactions, histone modifications and nucleosome positions. This new technology has marked advantages over microarray-based (ChIP-chip) approaches by offering higher specificity, sensitivity and coverage for locating TF occupancy or epigenetic markers across the genome. ChIP-seq experiments generate large amounts of data (in the order of tens of millions of reads), thus creating a bottleneck for data analysis and interpretation. Consequently, effective bioinformatics tools are needed to process, analyse and interpret these data in order to uncover underlying biological regulatory mechanisms.In essence, the ChIP-seq analysis workflow can be divided into the following steps:(i) Pre-processing. The goal of this step is to filter out erroneous and low-quality reads and to ensure that only the highest quality sequencing reads are retained for the sub-sequent mapping step;(ii) Mapping. This is the key step in which mapped reads are converted to an integer count of 'tags' at each position in the genome with fixed read length. The choice of flexibility options on mapping multiple reads to multiple sites affects sensitivity and specificity dependent on the volume and complexity of target genome sequences. The user can increase specificity using unique reads only or can increase sensitivity allowing multiple alignments of reads;(iii) Peak finding. This is the most challenging step in the analysis workflow, as the goal is to identify significant peak signals among background signals. This includes not only finding the strong signals, but also finding the statistically reproducible weak signals ob
%K CHIP-Seq analysis
%K Next-generation sequencing
%K comparative analysis
%K bioinformatics
%U http://www.humgenomics.com/content/5/2/117