%0 Journal Article
%T Predicting transcription factor binding sites using local over-representation and comparative genomics
%A Matthieu Defrance
%A Hélène Touzet
%J BMC Bioinformatics
%D 2006
%I BioMed Central
%R 10.1186/1471-2105-7-396
%X We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets.TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer webcite.The computational identification of functional transcription factor (TF) binding sites (TFBSs) from a nucleotide sequence alone is difficult. TFBSs are usually short (around 5–15 bases) and degenerate, and hence potential binding sites can occur very frequently by chance, leading to a high level of false positive in the predicted sites. Wasserman and Sandelin have termed this the futility theorem, since nearly 100% of predicted TFBSs have no function in vivo [1]. Solving this problem is crucial for mammalian genomes that contain large noncoding regions.Phylogenetic footprinting can significantly increase the accuracy of TFBSs predictions. If a region is conserved between sequences from distantly related organisms, it is likely to be subject to greater selection pressure and to have a biological role. Phylogenetic footprinting methods are based on the assumption that TFBSs are located in conserved regions that can be detected by alignment algorithms. A current limitation for mammalian organisms is that when nothing is known about the motif, the number of orthologous sequences at the correct evolutionary distance needs to be high [2].Another potentially fruitful approach for improving the accuracy of TFBS prediction is to use a set
%U http://www.biomedcentral.com/1471-2105/7/396