%0 Journal Article %T A method for probabilistic mapping between protein structure and function taxonomies through cross training %A Kshitiz Gupta %A Vivek Sehgal %A Andre Levchenko %J BMC Structural Biology %D 2008 %I BioMed Central %R 10.1186/1472-6807-8-40 %X We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees.We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions.Function and 3D structure of the proteins are said to be related to each other [3]. However, prediction of function on the basis of structure and vice versa still remains a partially solved problem, and is largely in the domain of biophysics and biochemistry [4]. This underlines the need for computational and bioinformatics methods to establish relationships between functions and structures of proteins. Previous attempts have been largely limited to examining a single protein and predicting structure and function based on its size, charge, sequence, and other physical attributes [5-7]. Further, content knowledge of protein classification has also been used to predict structure and function using data mining techniques [8-10]. Large protein classification schemes (e.g. SCOP [1], CATH [11], PROSITE [2], Pfam [12]) are available in public domain in the form of protein classification databases. Arguably, this latent knowledge has not been sufficien %U http://www.biomedcentral.com/1472-6807/8/40