%0 Journal Article %T Gene and protein nomenclature in public databases %A Katrin Fundel %A Ralf Zimmer %J BMC Bioinformatics %D 2006 %I BioMed Central %R 10.1186/1471-2105-7-372 %X We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those.The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism.In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity.The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application.Genes and proteins are biological objects of primary importance for understanding biochemical processes. The exchange of knowledge on any kind of object requires consistent names or identifiers for each object. So far, even though nomenclature paradigms are provided by several communities, the generation and assignment of names to newly identified genes and proteins is not strictly standa %U http://www.biomedcentral.com/1471-2105/7/372