%0 Journal Article %T Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis %A Gergely Csaba %A Fabian Birzele %A Ralf Zimmer %J BMC Structural Biology %D 2009 %I BioMed Central %R 10.1186/1472-6807-9-23 %X We create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH.Via an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification.Benchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath webcite.The classification and comparison of the more than 50'000 protein structures deposited in the PDB [1] (January 2009) is an essential step to extract valuable knowledge from protein structure data. Today, the two most prominent protein structure classification schemes are SCOP [2] and CATH [3]. Both partition proteins into domains. These domains are classified in a hierarchical manner: SCOP sorts protein domains into classes, folds, superfamilies and families while the four major levels of CATH are class, architecture, topology and homologous superfamily. The SCOP database is mainly based on expert knowledge and, on the first level of the hierarchy, defines four major classes namely all ¦Á, all ¦Â, ¦Á/¦Â as well as ¦Á + ¦Â describing the content of secondary structure elements in the domain. According to the SCOP authors, domains in a common fold have the same major secondary structures in the same arrangement with the same topological connections. In the same superfamily, domains share low sequence identities but their structures and, in many cases, functional features suggest th %U http://www.biomedcentral.com/1472-6807/9/23