全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

Search Datasets in Literature: A Case Study of GWAS

Full-Text   Cite this paper   Add to My Lib

Abstract:

One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Using Genome-Wide Association Studies (GWAS) as a use case, we conducted an initial study to identify GWAS dataset attributes in MEDLINE abstracts, by developing a hybrid approach that combines domain dictionaries and pattern-based rules. The automatic GWAS dataset attribute recognition system achieved an F-measure of 84.85%. We further applied the GWAS attribute recognition system to indexing MEDLINE abstracts and built an online GWAS dataset search engine called “GWAS Dataset Finder”. Our evaluation showed that the GWAS Dataset Finder outperformed PubMed significantly in retrieving literature with desired datasets. Our study demonstrates the potential application of text mining methods in building the data discovery index. It can create a better index of literature linked with their underlying data sets, thus improving data discoverability

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133