|
BMC Bioinformatics 2006
CoSMoS: Conserved Sequence Motif Search in the proteomeAbstract: We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS.CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins.The number of newly sequenced genes has been growing exponentially over the last decades [1]. This makes it technically impossible to use experimental biology to assign functions and to investigate the regulation of these newly discovered proteins. Over the past years, computational biology has been shown to be a powerful tool to assist in these assignments. This is based on the fact that proteins that share high sequence similarity, either within one organism or between different organisms, often perform very similar functions. Thus, the function of unknown proteins can often be directly predicted using a homology search against a database of proteins with assigned functions. Powerful algorithms and search tools such as BLAST have been developed to perform these homology searches [2,3].The data derived from these homology searches also provides valuable information about the evolutionary conservation of every single amino acid in the sequence. The neutral theory of molecular evolution states that mutations in amino acids occur in a stochastically constant manner as long as the mutations have no effect on the function of the gene product [4]. On the o
|