|
BMC Bioinformatics 2005
Better prediction of protein contact number using a support vector regression analysis of amino acid sequenceAbstract: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds.The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.Prediction of protein three-dimensional structure from primary sequence is the central problem in structural bioinformatics. One approach is to use known structur∈homolog proteins as templates to determine the tertiary structures of novel proteins of unknown structure. Approaches include comparative modelling, threading and fold recognition methods. One protein structural feature is of particular interest here, namely, residue contact number (CN) which can be used to enhance protein fold recognition [1]. This measure has also been regarded as the conserved solvent exposure descriptor of similar folds without a common evolutionary origin [2]. Furthermore, contact number may be used to determine the energy function allowing molecular dynamics simulations of protein structures [3]. Here, we seek to use protein contact number to
|