|
Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networksAbstract: Significant differences in residue preferences for specific contacts are observed, which combined with other features, lead to promising levels of prediction. In general, PSSM-based predictions, supported by secondary structure and solvent accessibility, achieve a good predictability of ~70–80%, measured by the area under the curve (AUC) of ROC graphs. The major and minor groove contact predictions stood out in terms of their poor predictability from sequences or PSSM, which was very strongly (>20 percentage points) compensated by the addition of secondary structure and solvent accessibility information, revealing a predominant role of local protein structure in the major/minor groove DNA-recognition. Following a detailed analysis of results, a web server to predict mononucleotide and dinucleotide-step contacts using PSSM was developed and made available at http://sdcpred.netasa.org/ webcite or http://tardis.nibio.go.jp/netasa/sdcpred/ webcite.Most residue-nucleotide contacts can be predicted with high accuracy using only sequence and evolutionary information. Major and minor groove contacts, however, depend profoundly on the local structure. Overall, this study takes us a step closer to the ultimate goal of predicting mutual recognition sites in protein and DNA sequences.Protein-DNA interactions have been the subject of extensive investigation in recent years [1-8]. Some of these studies have focussed on predicting transcription factor binding sites on DNA [9-11], whereas others focus on the prediction of a novel protein to be potentially DNA-binding [12-14]. Earlier, we have analyzed the sequence and structural features of DNA-binding sites in proteins and developed methods for their prediction using neural networks [15,16]. Similar and more accurate methods have since been reported [17-20].Although these methods have been successful in quickly identifying DNA-binding residues, they all fall short of predicting specific protein-DNA interactions. So far, the predic
|