%0 Journal Article %T CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures %A Hamed Bostan %A Naomie Salim %A Zeti Azura Hussein %A Peter Klappa %A Mohd Shahir Shamsir %J Advances in Bioinformatics %D 2012 %I Hindawi Publishing Corporation %R 10.1155/2012/849830 %X Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD) is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition. 1. Background Disulphide bonds are formed by oxidation of two cysteine residues in a protein and are significant to a protein¡¯s conformational stability as they confer greater thermal and chemical stability as well as stabilizing structural intermediates to ensure the correct folding pathway. However, the connectivity of the disulphide bonds in protein sequences can only be determined experimentally. Given this difficulty, the ability to evaluate or predict the disulphide bonding state and connectivity from the sequence would prove to be highly valuable in engineering proteins for biotechnological and medical applications. Computational approaches towards disulphide connectivity prediction have been based on various descriptors. One of these descriptors is the sequence motifs generated by combining the flanking residues on the either side of the the cysteine residue [1, 2]. These immediate residues flanking the cysteine have been shown to influence the cysteine¡¯s redox potential and the cysteine¡¯s steric accessibility [3]. These sequence motifs have been fed into various prediction methods [4] such as machine learning approaches (i.e., statistical methods, neural networks (NNs) [5], and %U http://www.hindawi.com/journals/abi/2012/849830/