OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

BMC Bioinformatics 2008

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites

DOI: 10.1186/1471-2105-9-262

Nak-Kyeong Kim, Kannan Tharakaraman, Leonardo Mari？o-Ramírez, John L Spouge

Full-Text Cite this paper Add to My Lib

Abstract:

A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly.Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.Transcription factor binding sites (TFBSs) provide a specific example of biologically functional sequence motifs that sometimes have positional preferences. TFBSs contribute substantially to the control of gene expression, and because of their biological importance, much experimental effort has been expended in identifying them. Because experimental identification is expensive, there are now many computational tools that identify TFBSs as the subsequences, or "motifs", common to a set of sequences. Most TFBSs correspond to short and imprecise motifs [1], however, so all computational tools in a recent contest performed rather poorly in identifying known TFBSs [2].Although some tools have an ad hoc basis [3-5], other tools have a basis in the calculus of probability, and can therefore immediately and systematically combine sequence with

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133