全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

DOI: 10.1155/2012/509126

Full-Text   Cite this paper   Add to My Lib

Abstract:

Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease. 1. Background Human genome sequencing marked the beginning of the era of large-scale genomics and proteomics, leading to large quantities of information on sequences, genes, interactions, and their annotations. In the same way that the capability to analyze data increases, the output by high-throughput techniques generates more information available for testing hypotheses and stimulating novel ones. Many experimental findings are reported in the -omics literature, where researchers have access to more than 20 million publications, with up to 4,500 new ones per day, available through to the widely used PubMed citation index and Google Scholar. This vast increase in available information demands novel strategies to help researchers to keep up to date with recent developments, as ad hoc querying with Boolean queries is tedious and often misses important information. Even though PubMed provides an advanced keyword search and offers useful query expansion, it returns hundreds or thousands of articles as result; these are sorted by publication date, without providing much help in selecting or drilling down to those few articles that are most relevant regarding the user’s actual question. As an example of both the amount of available information and the insufficiency of na?ve

References

[1]  S. Pyysalo, A dependency parsing approach to biomedical text mining, Ph.D. thesis, 2008.
[2]  D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, and P. Stoehr, “EBIMed—text crunching to gather facts for proteins from Medline,” Bioinformatics, vol. 23, no. 2, pp. e237–e244, 2007.
[3]  C. Plake, T. Schiemann, M. Pankalla, J. Hakenberg, and U. Leser, “ALIBABA: pubMed as a graph,” Bioinformatics, vol. 22, no. 19, pp. 2444–2445, 2006.
[4]  U. Leser and J. Hakenberg, “What makes a gene name? Named entity recognition in the biomedical literature,” Briefings in Bioinformatics, vol. 6, no. 4, pp. 357–369, 2005.
[5]  H. Xu, J. W. Fan, G. Hripcsak, E. A. Mendon?a, M. Markatou, and C. Friedman, “Gene symbol disambiguation using knowledge-based profiles,” Bioinformatics, vol. 23, no. 8, pp. 1015–1022, 2007.
[6]  J. Hakenberg, C. Plake, R. Leaman, M. Schroeder, and G. Gonzalez, “Inter-species normalization of gene mentions with GNAT,” Bioinformatics, vol. 24, no. 16, pp. i126–i132, 2008.
[7]  K. Oda, J. D. Kim, T. Ohta et al., “New challenges for text mining: mapping between text and manually curated pathways,” BMC Bioinformatics, vol. 9, supplement 3, article S5, 2008.
[8]  M. E. Califf and R. J. Mooney, “Relational learning of pattern-match rules for information extraction,” in Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6–11, AAAI Press, Menlo Park, Calif, USA, 1998.
[9]  N. Kushmerick, D. S. Weld, and R. B. Doorenbos, “Wrapper induction for information extraction,” in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI '97), pp. 729–737, 1997.
[10]  L. Schubert, “Can we derive general world knowledge from texts?” in Proceedings of the 2nd International Conference on Human Language Technology Research, pp. 94–97, Morgan Kaufmann, San Francisco, Calif, USA, 2002.
[11]  M. Friedman and D. S. Weld, “Efficiently executing information-gathering plans,” in Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI '97), pp. 785–791, Nagoya, Japan, 1997.
[12]  R. Bunescu, R. Ge, R. J. Kate et al., “Comparative experiments on learning information extractors for proteins and their interactions,” Artificial Intelligence in Medicine, vol. 33, no. 2, pp. 139–155, 2005.
[13]  W. Daelemans, S. Buchholz, and J. Veenstra, “Memory-based shallow parsing,” in Proceedings of the Conference on Natural Language Learning (CoNLL '99), vol. 99, pp. 53–60, 1999.
[14]  E. Brill, “A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92,” in Proceedings of the 3rd Conference on Applied Natural Language Processing, pp. 152–155, Trento, Italy, 1992.
[15]  A. Mikheev and S. Finch, “A workbench for finding structure in texts,” in Proceedings of the Applied Natural Language Processing (ANLP '97), Washington, DC, USA, 1997.
[16]  M. Craven and J. Kumlien, “Constructing biological knowledge bases by extracting information from text sources,” in Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, pp. 77–86, AAAI Press, 1999.
[17]  K. Seymore, A. McCallum, and R. Rosenfeld, “Learning hidden markov model structure for information extraction,” in Proceedings of the AAAI Workshop on Machine Learning for Information Extraction, 1999.
[18]  L. Hunter, Z. Lu, J. Firby et al., “OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression,” BMC Bioinformatics, vol. 9, article 78, 2008.
[19]  T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi, “Automated extraction of information on protein-protein interactions from the biological literature,” Bioinformatics, vol. 17, no. 2, pp. 155–161, 2001.
[20]  C. Blaschke, M. A. Andrade, C. Ouzounis, and A. Valencia, “Automatic extraction of biological information from scientific text: protein-protein interactions,” AAAI, pp. 60–67.
[21]  C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles,” Bioinformatics, vol. 17, no. 1, pp. S74–S82, 2001.
[22]  A. Rzhetsky, I. Iossifov, T. Koike et al., “GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data,” Journal of Biomedical Informatics, vol. 37, no. 1, pp. 43–53, 2004.
[23]  D. P. A. Corney, B. F. Buxton, W. B. Langdon, and D. T. Jones, “BioRAT: extracting biological information from full-length papers,” Bioinformatics, vol. 20, no. 17, pp. 3206–3213, 2004.
[24]  G. Leroy, H. Chen, and J. D. Martinez, “A shallow parser based on closed-class words to capture relations in biomedical text,” Journal of Biomedical Informatics, vol. 36, no. 3, pp. 145–158, 2003.
[25]  M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia, “Overview of the protein-protein interaction annotation extraction task of BioCreative II,” Genome Biology, vol. 9, no. 2, article S4, 2008.
[26]  J. D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, “Overview of BioNLP'09 shared task on event extraction,” in Proceedings of the Workshop Companion Volume for Shared Task (BioNLP '09), pp. 1–9, Association for Computational Linguistics, Boulder, Colo, USA, 2009.
[27]  A. Yakushiji, Y. Tateisi, Y. Miyao, and J. Tsujii, “Event extraction from biomedical papers using a full parser,” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 408–419, 2001.
[28]  J. Ding, D. Berleant, J. Xu, and A. W. Fulmer, “Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser,” in Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 467–471, November 2003.
[29]  H. Cunningham, Information Extraction, Automatic, Encyclopedia of Language and Linguistics, 2nd edition, 2005.
[30]  O. Etzioni, M. Cafarella, D. Downey et al., “Methods for domain-independent information extraction from the web: an experimental comparison,” in Proceedings of the 19th National Conference on Artificial Intelligence (AAAI '04), pp. 391–398, AAAI Press, Menlo Park, Calif, USA, July 2004.
[31]  M. Cafarella, D. Downey, S. Soderland, and O. Etzioni, “KnowItNow: fast, scalable information extraction from the web,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 563–570, Association for Computational Linguistics, Morristown, NJ, USA, 2005.
[32]  O. Etzioni, M. Banko, S. Soderland, and D. S. Weld, “Open information extraction from the web,” Communications of the ACM, vol. 51, no. 12, pp. 68–74, 2008.
[33]  M. Cafarella and O. Etzioni, “A search engine for natural language applications,” in Proceedings of the International Conference on World Wide Web (WWW '05), pp. 442–452, ACM, New York, NY, USA, 2005.
[34]  R. White, B. Kules, and S. Drucker, “Supporting exploratory search, introduction, special issue, communications of the ACM,” Communications of the ACM, vol. 49, no. 4, pp. 36–39, 2006.
[35]  W. T. Fu, T. G. Kannampallil, and R. Kang, “Facilitating exploratory search by model-based navigational cues,” in Proceedings of the 14th ACM International Conference on Intelligent User Interfaces (IUI '10), pp. 199–208, ACM, New York, NY, USA, February 2010.
[36]  J. Koren, Y. Zhang, and X. Liu, “Personalized interactive faceted search,” in Proceedings of the 17th International Conference on World Wide Web (WWW '08), pp. 477–485, ACM, April 2008.
[37]  V. Sinha and D. R. Karger, “Magnet: supporting navigation in semistructured data environments,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '05), pp. 97–106, ACM, June 2005.
[38]  M. Hearst, “Design recommendations for hierarchical faceted search interfaces,” in Proceedings of the ACM Workshop on Faceted Search (SIGIR '06), 2006.
[39]  S. Stamou and L. Kozanidis, “Towards faceted search for named entity queries,” Advances in Web and Network Technologies, and Information Management, vol. 5731, pp. 100–112, 2009.
[40]  D. Tunkelang, Faceted Search, Morgan & Claypool, 2009.
[41]  B. Settles, “ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text,” Bioinformatics, vol. 21, no. 14, pp. 3191–3192, 2005.
[42]  J. Lafferty and F. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning (ICML '01), 2001.
[43]  A. Doms and M. Schroeder, “GoPubMed: exploring PubMed with the gene ontology,” Nucleic Acids Research, vol. 33, no. 2, pp. W783–W786, 2005.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133