全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

BioLemmatizer: a lemmatization tool for morphological processing of biomedical text

DOI: 10.1186/2041-1480-3-3

Full-Text   Cite this paper   Add to My Lib

Abstract:

In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6% on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system.The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from http://biolemmatizer.sourceforge.net webcite.An important fundamental natural language processing (NLP) task is lemmatization. Lemmatization is a morphological transformation that changes a word as it appears in running text into the base or dictionary form of the word, which is known as a lemma, by removing the inflectional ending of the word. The lemma corresponds to the singular form in the case of a noun, the infinitive form in the case of a verb, and the positive form in the case of an adjective or adverb. We can think of lemmatization as a normalization process in which different morphological variants of a word are mapped into the same underlying l

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133