|
Genome Biology 2010
Genome-wide prediction of transcription factor binding sites using an integrated modelAbstract: Transcription factors (TFs) play a central role in regulating gene expression. Binding of TFs to their target loci is a key step of activating or repressing a gene. Determination of transcription factor binding sites (TFBSs) is an important but challenging problem because the DNA segments recognized by TFs are often short and dispersed in the genome [1]. In addition, the target loci of a TF vary depending on tissue, stage of development or physiological condition. Such condition-dependent regulation makes the problem even more challenging.Both experimental and computational technologies have been developed to identify TFBSs. Chromatin immunoprecipitation (ChIP)-chip [2,3] and, more recently, ChIP-seq have become popular and powerful tools to determine TFBSs at a genome-wide scale [3-5]. Currently, a major bottleneck in applying ChIP-chip or ChIP-seq to all TFs encoded in a genome is the availability of ChIP-quality antibodies against each TF. Efforts have been made to tag every individual TF but the success of tagging techniques has only been shown for a limited number of TFs in mammalian genomes.Many computational methods [6-15] (for a survey, see [16]) have been developed to identify DNA segments recognized by TFs. These DNA motifs are often represented by a position-specific scoring matrix (PSSM) [17] that reflects the preference of nucleotides at each position. Because simply matching such DNA motifs in the genome always generates too many false positives, additional information, such as co-localization and conservation of TFBSs, are often included to improve prediction accuracy. Methods such as Comet [18], Cluster-Buster [19] and ModuleMiner [20] use motifs documented in databases - for example, JASPAR [21] and TRANSFAC [22] - or predicted by de novo motif finding algorithms, and search for clusters of TFBSs. Methods like Stubb [23] and EEL [24] also include motif conservation information in addition to TFBS clustering. Other methods such as CisModule [25] and
|