|
Human Genomics 2011
Overview of biological database mapping services for interoperation between different 'omics' datasetsDOI: 10.1186/1479-7364-5-6-703 Keywords: omics, genomics, proteomics, identifier mapping, biological database identifier converter Abstract: Many primary biological databases are dedicated to providing annotation for a specific type of biological molecule such as a clone, transcript, gene or protein (eg the National Center for Biotechnology Information [NCBI] Entrez Gene[1] database provides annotation for genes, whereas UniprotKB[2,3] provides this for proteins). Other types of secondary databases provide relevant information about the attributes of these molecules, such as pathway, function(s) or structural information (eg the Kyoto Encyclopedia of Genes and Genomes [KEGG][4] and Protein Data Bank [PDB] [5]. Often, these databases provide limited cross-references for interoperation between databases. Thus, enhanced mapping between these databases is required to facilitate the correlation of independent experimental datasets, which can be provided by Identifier (Id) mapping services.Id mapping services are tools to connect one type of database Id to the corresponding Id in another database. This mapping includes three types of relationships: one-to-one, one-to-many and many-to-many. One-to-many and many-to-many relationships are required to account for biological processes such as alternative splicing, resulting in one gene giving rise to multiple transcripts, the presence of several isoforms of a single protein and other similar processes occurring in a cell. Also, gene expression datas such as microarray datas are known to have multiple probes targeting a single transcript and vice versa (eg Affymetrix[6] probes, which can be described by many-to-many relationships). Thus, mapping Ids of multiple databases to one another facilitates the correlation of different types of 'omics' datasets which, in turn, might provide meaningful insights into the biological processes occurring in a cell.Several Id mapping services are publicly available (Table 1). The seven Id converters that are discussed in detail in this review were selected to represent the majority of Id converters--as well as major biological data
|