Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD) is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition. 1. Background Disulphide bonds are formed by oxidation of two cysteine residues in a protein and are significant to a protein’s conformational stability as they confer greater thermal and chemical stability as well as stabilizing structural intermediates to ensure the correct folding pathway. However, the connectivity of the disulphide bonds in protein sequences can only be determined experimentally. Given this difficulty, the ability to evaluate or predict the disulphide bonding state and connectivity from the sequence would prove to be highly valuable in engineering proteins for biotechnological and medical applications. Computational approaches towards disulphide connectivity prediction have been based on various descriptors. One of these descriptors is the sequence motifs generated by combining the flanking residues on the either side of the the cysteine residue [1, 2]. These immediate residues flanking the cysteine have been shown to influence the cysteine’s redox potential and the cysteine’s steric accessibility [3]. These sequence motifs have been fed into various prediction methods [4] such as machine learning approaches (i.e., statistical methods, neural networks (NNs) [5], and
References
[1]
S. M. Muskal, S. R. Holbrook, and S. H. Kim, “Prediction of the disulfide-bonding state of cysteine in proteins,” Protein Engineering, vol. 3, no. 8, pp. 667–672, 1990.
[2]
M. H. Mucchielli-Giorgi, S. Hazout, and P. Tufféry, “Predicting the disulfide bonding state of cysteines using protein descriptors,” Proteins, vol. 46, no. 3, pp. 243–249, 2002.
[3]
F. Ferrè and P. Clote, “DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification,” Nucleic Acids Research, vol. 34, pp. W182–W185, 2006.
[4]
R. Singh, “A review of algorithmic techniques for disulfide-bond determination,” Briefings in Functional Genomics and Proteomics, vol. 7, no. 2, pp. 157–172, 2008.
[5]
J. Song, Z. Yuan, H. Tan, T. Huber, and K. Burrage, “Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure,” Bioinformatics, vol. 23, no. 23, pp. 3147–3154, 2007.
[6]
Y. C. Chen, Y. S. Lin, C. J. Lin, and J. K. Hwang, “Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences,” Proteins, vol. 55, no. 4, pp. 1036–1042, 2004.
[7]
P. L. Martelli, P. Fariselli, and R. Casadio, “Prediction of disulfide-bonded cysteines in proteomes with a hidden neural network,” Proteomics, vol. 4, no. 6, pp. 1665–1671, 2004.
[8]
C. H. Tsai, B. J. Chen, C. H. Chan, H. L. Liu, and C. Y. Kao, “Improving disulfide connectivity prediction with sequential distance between oxidized cysteines,” Bioinformatics, vol. 21, no. 24, pp. 4416–4419, 2005.
[9]
A. Ceroni, A. Passerini, A. Vullo, and P. Frasconi, “Disulfind: a disulfide bonding state and cysteine connectivity prediction server,” Nucleic Acids Research, vol. 34, pp. W177–W181, 2006.
[10]
A. Vullo and P. Frasconi, “Disulfide connectivity prediction using recursive neural networks and evolutionary information,” Bioinformatics, vol. 20, no. 5, pp. 653–659, 2004.
[11]
J. Lenffer, P. Lai, W. El Mejaber et al., “CysView: protein classification based on cysteine pairing patterns,” Nucleic Acids Research, vol. 32, supplement, pp. W350–W355, 2004.
[12]
F. Hatahet and L. W. Ruddock, “Protein disulfide isomerase: a critical evaluation of its function in disulfide bond formation,” Antioxidants and Redox Signaling, vol. 11, no. 11, pp. 2807–2850, 2009.
[13]
J. E. Chambers, T. J. Tavender, O. B. V. Oka, S. Warwood, D. Knight, and N. J. Bulleid, “The reduction potential of the active site disulfides of human protein disulfide isomerase limits oxidation of the enzyme by Ero1α,” Journal of Biological Chemistry, vol. 285, no. 38, pp. 29200–29207, 2010.
[14]
P. Baldi, J. Cheng, and A. Vullo, “Large-scale prediction of disulphide bond connectivity,” Advances in Neural Information Processing Systems, no. 17, pp. 97–104, 2005.
[15]
B. D. O'Connor and T. O. Yeates, “GDAP: a web tool for genome-wide protein disulfide bond prediction,” Nucleic Acids Research, vol. 32, pp. W360–W364, 2004.