|
TIM-Finder: A new method for identifying TIM-barrel proteinsAbstract: To develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of Bacillus subtilis, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method.TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at http://202.112.170.199/TIM-Finder/ webcite.Proteins have complex three-dimensional (3D) shapes, a fact well demonstrated by more than 60,000 experimentally determined structures deposited in the current PDB database http://www.rcsb.org/pdb/home/home.do webcite. The number of unique protein folds (or architectural types) should be much smaller than the number of protein families defined by sequence similarity [1]. As more structures are determined, it also becomes increasingly clear that the distribution of proteins between different folds is not even [2]. Although many folds have so far been observed for only a few proteins, some protein folds (known as superfolds) occur frequently. As reported by Salem et al. (1999), the top ten superfolds could account for approximately one third of all proteins in the PDB database.One of the top ten superfolds is the triosephosphate isomerase (TIM)-barrel fold (Figure 1A). It was first observed in triosephosphate isomerase and consists of eight α-helices on the outside and eight parallel β-strands on the inside that alternate along the peptide backbone [3]. In the past, many protein structures w
|