|
Context dependent reference states of solvent accessibility derived from native protein structures and assessed by predictability analysisAbstract: We compiled the statistics of highest observed ASA (HOA) of residues in their different contexts and analyzed their distribution in all 400 possible combinations for each residue type. We observe that many trippetides are more exposed than ESA and that HOA residues are often found in turn, coil and bend conformations. On the other hand several residues are never observed in an exposure state close to ESA values. A neural networks trained with HOA-normalized data outperforms the one trained with ESA-normalized values. However, the improvements are subtle in some residues, while they are more significant in others.HOA based normalization of solvent accessibility from native structures is proposed and it shows improvement in sequence-based predictability, as well as enrichment in interface residues on surface. There may still be some difference between the highest possible ASA and highest observed ASA due to an insufficiently covered space of ASA distribution in the PDB, which limit the overall improvement in prediction to a relatively modest degree.Protein three-dimensional structure prediction directly from amino acid sequence is an important issue in bioinformatics. An intermediate approach to this problem is to predict the so-called one-dimensional structural properties of proteins. The solvent accessibility or accessible surface area (ASA) of an amino acid residue in a protein structure is one such property and the knowledge of this property can significantly enhance the overall structure and function prediction of proteins [1,2]. Given an amino acid sequence, the goal of such prediction is to estimate the ASA of each residue making use of previously observed ASA values taken from known protein structures. The knowledge from previously observed structures is modeled using machine learning and other methods [3-16]. Various methods of predicting ASA from sequence or sequence-derived evolutionary information have been developed such as neural networks [8-12], Bayesia
|