|
Human Genomics 2011
In-silico human genomics with GeneCardsDOI: 10.1186/1479-7364-5-6-709 Keywords: GeneCards, GeneDecks, Partner Hunter, Set Distiller, omics, genomics, human genes, database, synthetic lethality, genetic variations Abstract: From the very beginning, the core GeneCards features included two important components: the capability to view integrated details about a gene in 'card' format and a full text-based search engine. GeneCards has evolved by constantly adding new data sources and data types (eg protein expression and gene networks), revamping the search engine to improve results and performance, and expanding the original gene-centric dogma to encompass sets of genes.Currently, GeneCards automatically mines over 90 sources in an offline process and constructs a consolidated gene list. First, the complete current snapshot of the HUGO Gene Nomenclature Committee (HGNC)-approved symbols[1] is used as the core gene list. Next, human Entrez Gene[2] entries that are different from the HGNC genes are added. Finally, human Ensembl[3] records are matched against the emerging gene list via GeneLoc's exon-based unification algorithm;[4] those that are not found to be equivalent to others in the set are included as novel Ensembl-based GeneCards gene entries. These primary sources provide annotations for aliases, descriptions, previous symbols, gene category, location, summaries, paralogues and non-coding RNA (ncRNA) details. Once the gene list is in place with these significant annotations, over 90 data sources--including those noted above and others[4-9]--are mined for thousands of additional descriptors.The data for each gene are collected into a text file which is used to display the web-card. In addition to the legacy text file format, the complex data model of GeneCards version 3 is stored in relational databases [10]. One database ('by resource') stores the data largely in the originally mined architecture, and another database ('by function') supports the website and has over 130 tables and views, with an average volume of hundreds of thousands of records. The largest table has over 6.5 million rows. This compendium is modelled into 40 entities, with hundreds of hierarchical relationships.
|