全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2009 

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

DOI: 10.1371/journal.pone.0006393

Full-Text   Cite this paper   Add to My Lib

Abstract:

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.

References

[1]  Hoffmann R, Valencia A (2004) A gene network for navigating the literature. Nat Genet 36: 664.
[2]  Rebholz-Schuhmann D, Kirsch H, Arregui M, Gaudan S, Riethoven M, et al. (2007) EBIMed–text crunching to gather facts for proteins from Medline. Bioinformatics 23: e237–e244.
[3]  Palmer M, Gildea D, Kingsbury P (2005) The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics Journal 31: 71–106.
[4]  Kogan Y, Collier N, Pakhomov S, Krauthammer M (2005) Towards semantic role labeling & IE in the medical literature. AMIA Annu Symp Proc 410–414.
[5]  Klein D, Manning C (2003) Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics 423–430.
[6]  Fundel K, Küffner R, Zimmer R (2007) RelEx–relation extraction using dependency parse trees. Bioinformatics 23: 365–371.
[7]  Lease M, Charniak E (2005) Parsing Biomedical Literature. Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP'05) 58–69.
[8]  Charniak E (2000) A maximum-Entropy-Inspired Parser. Proceedings of the NAACL-2000 132–139.
[9]  Clegg AB (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics 8: 24.
[10]  Tsai TH, Wu CW, Lin Y-C, Hsu W-L (2005) Exploiting full parsing information to label semantic roles using an ensemble of ME and SVM via integer linear programming. CoNLL-05 Conference paper.
[11]  Tsai RT-H, Chou WC, Su YS, Lin YC, Sung CL, et al. (2007) BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features. BMC Bioinformatics 8: 325.
[12]  Miyao Y, Tsujii J (2005) Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics 83–90.
[13]  Tsujii J (2006) Linking Text with Knowledge - Challenges in Text Mining for Biology. ICSB 2006 presentation.
[14]  Collobert R, Weston J (2007) Fast Semantic Extraction Using a Novel Neural Network Architecture. 45th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.
[15]  Collobert R, Weston J (2008) A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. Proceedings of the 25th international conference on Machine learning.
[16]  Pradhan S, Ward W, Hacioglu K, Martin J, Jurafsky D (2004) Shallow Semantic Parsing using Support Vector Machines. Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistic annual meeting (HL/NAACL).
[17]  Nedellec C (2005) Learning language in logic - genic interaction extraction challenge. Proceedings of the ICML-2005 Workshop on Learning Language in Logic (LLL05) 31–37.
[18]  Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6: Suppl 1S1.
[19]  Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, et al. (2003) PreBIND and Textomy–mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4: 11.
[20]  Jose H, Vadivukarasi T, Devakumar J (2007) Extraction of protein interaction data: a comparative analysis of methods in use. EURASIP J Bioinform Syst Biol.
[21]  Eom J-H, Zhang B-T (2004) PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining. Genomics & Informatics 2: 99–106.
[22]  Bethard S, Lu Z, Martin JH, Hunter L (2008) Semantic role labeling for protein transport predicates. BMC Bioinformatics 9: 277.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133