%0 Journal Article
%T Improving protein structure similarity searches using domain boundaries based on conserved sequence information
%A Kenneth Thompson
%A Yanli Wang
%A Tom Madej
%A Stephen H Bryant
%J BMC Structural Biology
%D 2009
%I BioMed Central
%R 10.1186/1472-6807-9-33
%X Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB). These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved.Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system.As the amount of diverse biological data continues to grow, it is important for new methods of analysis to be devised and current methods to be improved. The ability to detect that two proteins have diverged from a common ancestor allows one to infer functional similarity between the two. A common method for identifying similarity between proteins is the use of sequence alignment tools such as FASTA [1] and BLAST [2], which provide an alignment of two sequences and a score indicating whether the alignment is significant or could be attributed to chance. The comparison of protein structures allows one to peer back farther into evolutionary time, based on the concept that a form or structure remains similar long after sequence similarity has become undetectable [3-6]. There are many methods [7-15] and databases [16-19] currently available for protein structure comparisons. While the performance of the methods and d
%U http://www.biomedcentral.com/1472-6807/9/33