%0 Journal Article
%T EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments
%A Jean-Christophe Gelly
%A Laurent Chiche
%A Jér？me Gracy
%J BMC Bioinformatics
%D 2005
%I BioMed Central
%R 10.1186/1471-2105-6-4
%X Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors.We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches.With the sequencing of entire genomes and the exponential growth of sequence databases on the one hand, and the significant number of known folds compared to the putative number of possible folds in the fold space on the other hand, sequence-structure comparison is currently one main challenge of the post-genomic era. To this goal, 3D-environments were used by Eisenberg and coll. in the early 90s to build statistical potentials indicating the probability of finding each amino acid in a given structural environment as described by the secondary structure, the solvent accessibility and the polarity of
%U http://www.biomedcentral.com/1471-2105/6/4