全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2014 

Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life

DOI: 10.1371/journal.pone.0089550

Full-Text   Cite this paper   Add to My Lib

Abstract:

Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.

References

[1]  Miller J, Dikow T, Agosti D, Sautter G, Catapano T, et al. (2012) From taxonomic literature to cybertaxonomic content. BMC Biol 10 doi:10.1186/1741-7007-10-87.
[2]  IISE (2011) State of Observed Species. Int Inst Species Explor Available: http://species.asu.edu/SOS.
[3]  Penev L, Agosti D, Georgiev T, Catapano T, Miller J, et al. (2010) Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples. Zookeys 50: 1–16. doi: 10.3897/zookeys.50.538
[4]  Jackson P, Moulinier I (2007) Natural Language Processing for Online Applications. 2nd ed. Amsterdam: John Benjamins Publishing Company. 231 p.
[5]  Thessen AE, Cui H, Mozzherin D (2012) Applications of natural language processing in biodiversity science. Adv Bioinformatics 2012 doi:10.1155/2012/391574.
[6]  Cui H (2012) CharaParser for Fine-Grained Semantic Annotation of Organism Morphological Descriptions. J Am Soceity Inf Sci Technol 63: 738–754 doi:10.1002/asi.22618.
[7]  Agosti D, Egloff W (2009) Taxonomic information exchange and copyright: the Plazi approach. BMC Res Notes 2: 53. doi: 10.1186/1756-0500-2-53
[8]  Heath T, Bizer C (2011) Linked data: Evolving the web into a global data space. San Rafael, California, USA: Morgan & Claypool Publishers. 122 p.
[9]  Bizer C, Heath T, Berners-Lee T (2009) Linked Data - The Story So Far. Int J Semant Web Inf Syst 5: 1–22. doi: 10.4018/jswis.2009081901
[10]  Deans AR, Yoder MJ, Balhoff JP (2011) Time to change how we describe biodiversity. Trends Ecol Evol 27: 78–84. doi: 10.1016/j.tree.2011.11.007
[11]  Page RDM (2006) Taxonomic names, metadata, and the Semantic Web. Biodivers Informatics 3: 1–15.
[12]  Webb C, Baskauf S (2011) Darwin-SW: Darwin Core data for the semantic web. TDWG Annual Conference. New Orleans, Louisiana, USA. Available: http://www.tdwg.org/fileadmin/2011confer?ence/slides/Webb_DarwinSW.pdf.
[13]  Page RDM (2008) Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Brief Bioinform 9: 345–354. doi: 10.1093/bib/bbn022
[14]  Madin J, Bowers S, Schildhauer MP, Krivov S, Pennington D, et al. (2007) An ontology for describing and synthesizing ecological observation data. Ecol Inform 2: 279–296. doi: 10.1016/j.ecoinf.2007.05.004
[15]  Parr CS, Guralnick R (2011) Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 27: 94–103. doi: 10.1016/j.tree.2011.11.001
[16]  Michener WK, Jones MB (2012) Ecoinformatics: supporting ecology as a data-intensive science. Trends Ecol Evol 27: 85–93. doi: 10.1016/j.tree.2011.11.016
[17]  Williams RJ, Martinez ND, Golbeck J (2006) Ontologies for ecoinformatics. J Web Semant 4: 237–242. doi: 10.1016/j.websem.2006.06.002
[18]  Parr CS, Sachs J, Parafiynyk A, Wang T, Espinosa R, et al.. (2006) ETHAN: the Evolutionary Trees and Natural History Ontology. 18 p. Available: http://aisl.umbc.edu/resources/320.pdf.
[19]  Ananiadou S, Kell DB, Tsujii J (2006) Text mining and its potential applications in systems biology. Trends Biotechnol 24: 571–579. doi: 10.1016/j.tibtech.2006.10.002
[20]  Krallinger M, Valencia A, Hirschman L (2008) Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 9: S8. doi: 10.1186/gb-2008-9-s2-s8
[21]  Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32: D267–D270. doi: 10.1093/nar/gkh061
[22]  Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21: 248–256. doi: 10.1093/bioinformatics/bth496
[23]  Yu H, Kim W (2007) Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. J Biomed Inform 40: 150–159. doi: 10.1016/j.jbi.2006.06.001
[24]  Chang JT, Schutze H (2006) Abbreviations in biomedical text. Text Mining for Biology and Biomedicine 99–119.
[25]  Clark T, Martin S, Liefeld T (2004) Globally distributed object identification for biological knowledgebases. Brief Bioinform doi: 10.1093/bib/5.1.59
[26]  Wieczorek J, Bloom D, Guralnick R, Blum S (2012) Darwin Core: An evolving community-developed biodiversity data standard. PLoS One http://dx.plos.org/10.1371/journal.pone.?0029715.
[27]  Patterson DJ, Faulwetter S, Shipunov A (2008) Principles for a names-based cyberinfrastructure to serve all of biology. Zootaxa 153–163.
[28]  Patterson DJ (2010) Future Taxonomy. In: Polaszek A, editor. Systema Naturae 250 - The Linnaean Ark. London: Taylor & Francis. pp. 117–126.
[29]  Rotman D, Procita K, Hansen D, Sims Parr C, Preece J (2012) Supporting content curation communities: The case of the Encyclopedia of Life. J Am Soc Inf Sci Technol 63: 1092–1107. doi: 10.1002/asi.22633
[30]  Leary PR, Remsen DP, Norton CN, Patterson DJ, Sarkar IN (2007) uBioRSS: tracking taxonomic literature using RSS. Bioinformatics 23: 1434–1436. doi: 10.1093/bioinformatics/btm109
[31]  Akella LM, Norton CN, Miller H (2012) NetiNeti: Discovery of scientific names from text using machine learning methods. BMC Bioinformatics 13: 211 doi:10.1186/1471-2105-13-211.
[32]  Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27: 431–432. doi: 10.1093/bioinformatics/btq675
[33]  Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76: 378–382. doi: 10.1037/h0031619
[34]  Sanchez-Graillet O, Poesio M (2007) Negation of protein–protein interactions: analysis and extraction. Bioinformatics 23: i424–i432. doi: 10.1093/bioinformatics/btm184
[35]  Mungall C, Torniai C, Gkoutos G, Lewis SE, Haendel MA (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biol 13: R5. doi: 10.1186/gb-2012-13-1-r5
[36]  Rizzo G, Troncy R (2012) NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. In: Bizer C, Heath T, Berners-Lee T, Hausenblas M, editors. Linked Data on the Web (LDOW2012). Lyon, France. Available: http://www.eurecom.fr/~troncy/Publicatio?ns/Rizzo_Troncy-ldow12.pdf.
[37]  Milne D, Witten IH (2008) Learning to link with wikipedia. 17th ACM Conference on Information and Knowledge Management (CIKM'08). Napa Valley, California, USA. pp. 509–518.
[38]  Hancock D, Morrison N, Velarde G, Field D (2009) Terminizer–Assisting Mark-Up of Text Using Ontological Terms. Nature Precedings Available: http://precedings.nature.com/documents/3?128/version/1.
[39]  Mendes PN, Jakob M, Garcia-Silva A, Bizer C (2011) DBpedia spotlight: shedding light on the web of documents. Proceedings of the 7th International Conference on Semantic Systems. New York: ACM. pp. 1–8.
[40]  Poelen J (2013) Encyclopedia of Life's Global Biotic Interactions - Unleashing EOL's Interaction Datasets. Marine Biology Meeting. Mexico City, Mexico. Available: http://www.slideshare.net/jhpoelen245/gl?o-bi-statusunido23may2013.
[41]  Tan P-N, Steinbach M, Kumar V (2005) Introduction to Data Mining. Boston: Addison-Wesley. 769 p.
[42]  Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, et al. (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7: e1000247 http://dx.plos.org/10.1371/journal.pbio.?1000247.
[43]  Deans AR, Kawada R (2008) Alobevania, a new genus of neotropical ensign wasps (Hymenoptera: Evaniidae), with three new species: integrating taxonomy with the World Wide Web. Zootaxa 28–44.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133