|
BMC Bioinformatics 2008
GAPscreener: An automatic tool for screening human genetic association literature in PubMed using the support vector machine techniqueAbstract: The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy.GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.The peer-reviewed scientific literature is a major source of information for developing research hypotheses and creating new knowledge through synthesis of research findings [1]. The information explosion in biomedical science has created a huge challenge for researchers, who want to obtain useful information promptly and efficiently. Human genetic association studies epitomize this challenge because they have proliferated rapidly since completion of the Human Genome Project [2]. Systematic review and meta-analysis have become important approaches for evaluating the robustness of such associations across different study platforms and populations [3]. A key factor in the quality of a systematic review is complete capture of the relevant studies [4]. Many databases that deposit genetic association information, including citations from PubMed, have b
|