|
Genome Biology 2000
Predicting protein phosphorylation sitesDOI: 10.1186/gb-2000-1-1-reports022 Abstract: The authors built one network to predict tyrosine phosphorylation sites, one for serines and one for threonines. They trained each network as follows. First they made a list of all proteins known (from experiment) to be phosphorylated at the relevant residue. For each protein, they identified the peptide of between nine and eleven residues that included, for example, the phosphotyrosine; these peptides served as positive controls. Then Blom et al. assumed that all other tyrosines in the proteins were not part of phosphorylation sites, so the peptides that included these tyrosines were used as negative controls. After the authors had trained the neural networks on groups of such peptides for phosphorylated tyrosine, serine and threonine, the networks predicted 52% of known threonine phosphorylation sites, 86% of known serine sites, and 68% of known tyrosine sites in a test set of data. The authors also used serine networks to predict threonine phosphorylation sites in a test set of data, and correctly identified 81% of the known sites; when they tried the reverse experiment, predicting serine sites using threonine networks, the score was only 54%. They also predicted phosphorylation sites on the transcriptional adaptor p300/CBP, which remain to be tested by experiment. In the last section of the paper, the authors trained another set of neural networks using predicted three-dimensional structures of phosphopeptides. The results were less accurate than those obtained using the sequences.Users can make their own predictions with the neural networks at NetPhos. The experimental controls came from PhosphoBase.It is hard to evaluate exactly how good the authors' neural networks are, as we do not know exactly what is in the test set that produced the prediction scores given in the paper. But they seem to be useful tools, especially for experimentalists who are planning to confirm the predictions on new proteins. There is one major unexplained puzzle in this paper: most enz
|