|
BMC Bioinformatics 2005
Speeding disease gene discovery by sequence based candidate prioritizationAbstract: We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time.PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.Over the last twenty years the genes underlying more than a thousand classically Mendelian disorders have been successfully identified. By contrast, only a relatively small number of genetic components of complex traits have been characterized [1].Regions of interest identified through complex-trait linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. The traditional candidate-gene approach to reducing this number of genes to a manageable level involves attempting to match functional annotation to knowledge of the disease or phenotype under investigation. Unfortunately this approach has been characterized by unsubstantiated and unreplicated claims [2].Problems arise firstly because the link between genotype and phenotype in complex disorders tends to be weak; matching a single gene's functional annotation to a phenotype is unlikely to be successful unless the gene in question is clearly related to some known pathogenesis of the disease. Secondly, functional annotation of the human
|