All Title Author
Keywords Abstract

Construction of an annotated corpus to support biomedical information extraction

DOI: 10.1186/1471-2105-10-349

Full-Text   Cite this paper   Add to My Lib


We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%.The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes.Due to the rapid advances in biomedical research, scientific literature is being published at an ever-increasing rate [1]. Without automated means, it is difficult for researchers to keep abreast of developments within biomedicine [2-6]. Text mining, which is receiving increasing interest within the biomedical field [7,8], enriches text via the addition of semantic metadata, and thus permits ta


comments powered by Disqus

Contact Us


微信:OALib Journal