全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Tandem machine learning for the identification of genes regulated by transcription factors

DOI: 10.1186/1471-2105-6-204

Full-Text   Cite this paper   Add to My Lib

Abstract:

This method has been validated using models of DNA binding sites recognized by the xenobiotic-sensitive nuclear receptor, PXR/RXRα, for target genes within the human genome. An information theory-based weight matrix was first derived and refined from known PXR/RXRα binding sites. The promoter region of candidate genes was scanned with the weight matrix. A novel information density-based clustering algorithm was then used to identify clusters of information rich sites. Finally, transformed data representing metrics of location, strength and clustering of binding sites were used for classification of promoter regions using an ensemble approach involving neural networks, decision trees and Na?ve Bayesian classification. The method was evaluated on a set of 24 known target genes and 288 genes known not to be regulated by PXR/RXRα. We report an average accuracy (proportion of correctly classified promoter regions) of 71%, sensitivity of 73%, and specificity of 70%, based on multiple cross-validation and the leave-one-out strategy. The performance on a test set of 13 genes showed that 10 were correctly classified.We have developed a machine learning approach for the successful detection of gene targets for transcription factors with high accuracy. The method has been validated for the transcription factor PXR/RXRα and has the potential to be extended to other transcription factors.Nucleic acid binding sites recognized by transcription factors are comprised of families of short, related, often degenerate sequences that share a common function. This degeneracy may be represented in the form of a position-specific weight matrix (PWM) [1,2]. In fact, PWMs have been widely applied [2,3] and several databases host them [4,5].Using information theory, the degree of conservation of an individual member (both known and predicted) of that family and its corresponding weight matrix may be quantified in terms of bits of information [1]. The strength of experimental binding has been s

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133