|
BMC Bioinformatics 2008
Extending pathways based on gene lists using InterPro domain signaturesAbstract: In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example.Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.Many high-throughput techniques such as DNA microarray analysis, siRNA screens or proteomic approaches result in extensive data. After careful statistical analysis the result of such an experiment is typically a list of candidate genes, relevant for certain biological processes, or an ordered gene list, sorted according to the significance in one or more biological processes [1]. The data analysis and interpretation of such lists provides a major bottleneck and a task for bioinformatics and systems biology. Many approaches have been published that assess the significant over-representation of biological functions or pathways as annotated in GO or pathway databases through gene set enrichment analysis [2-8].However, for many of the screened genes there is hardly any functional annotation available. For example, the number of human genes annotated in the KEGG database [9] is only about 4,000. This contrasts the estimated number of putative protein coding genes which exceeds 23,000 (counted as the number of Entrez gene ids in the IPI-human database) [10,11]. Many approaches rely on automatically inferred functional annotations [12].
|