|
- 2015
基于多源知识的中文微博命名实体链接
|
Abstract:
摘要: 命名实体在文本中是承载信息的重要单元,而微博作为一种分享简短实时信息的社交网络平台,其文本长度短、不规范,而且常有新词出现,这就需要对其命名实体进行准确的理解,以提高对文本信息的正确分析。提出了基于多源知识的中文微博命名实体链接,把同义词词典、百科资源等知识与词袋模型相结合实现命名实体的链接。在NLP&CC2013中文微博实体链接评测数据集进行了实验,获得微平均准确率为92.97%,与NLP&CC2013中文实体链接评测最好的评测结果相比,提高了两个百分点。
Abstract: Named entity is an important component conveying information in texts. Micro-blog is a social network platform used to share brief real-time information, with characteristics such as short text length, nonstandard words, and even the frequent emergence of neologisms.So an accurate understanding of the named entities is needed to ensure a correct analysis of the text information. A Chinese Micro-blog entity linking strategy was proposed based on multi-resource knowledge, combing the dictionary of synonyms, the encyclopedia resources as well as the bag-of-words model together to deal with named entity linking.In this strategy, named entities to be linked in Micro-blog were mapped to the corresponding candidate entities in the knowledge base. The evaluation results obtain a micro average accuracy of 92.97%, based on experiments using data sets of NLP& CC2013 Chinese micro-blog entity linking track. Compared with the state-of-the-art result, the accuracy of this method is two percent higher,which demonstrates the effectiveness of our method
[1] | SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 697-706. |
[2] | BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 1247-1250. |
[3] | LIU Xiaohua, LI Yitong, WU Haocheng, et al. Entity linking for tweets[C]//Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2013: 1304-1311. |
[4] | BONTCH EVA K, ROUT D. Making sense of social media streams through semantics: a survey[J]. Semantic Web Journal, 2012. URL: http://www. semantic-web-journal.net/sites/default/files/swj303.pdf |
[5] | 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009, 23(2):9-11. ZHAO Jun. Named entity recognition, disambiguation and cross lingual[J]. Chinese Information Processing, 2009, 23(2):9-11. |
[6] | GUO Yuhang, QIN Bing, LI Yuqin, et al. Improving candidate generation for entity linking[M]//Natural Language Processing and Information Systems. Berlin: Springer, 2013: 225-236. |
[7] | MIHALCEA R, CSOMAI A. Wikify!: linking documents to encyclopedic knowledge[C]//Proceedings of the sixteenth ACM Conference on Conference on Information and Knowledge Management. New York: ACM, 2007: 233-242. |
[8] | 曹犟, 邬晓钧, 夏云庆, 等. 基于拼音索引的中文模糊匹配算法[J]. 清华大学学报: 自然科学版, 2009, 49(S1):1328-1332. CAO Jiang, WU Xiaojun, XIA Yunqing, et al. Pinyin-indexed method for approximate matching in Chinese[J]. Journal of Tsinghua University: Science and Technology, 2009, 49(S1):1328-1332. |
[9] | DILL S, EIRON N, GIBSON D, et al. SemTag and seeker: bootstrapping the semantic web via automated semantic annotation[C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM, 2003:178-186. |
[10] | GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using wikipedia-based explicit semantic analysis[C]//IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence San Francisco: Morgan Kaufmann Publishers Inc, 2007: 1606-1611. |
[11] | LEY M. DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2(2):1493-1500. |
[12] | COHEN W, RAVIKUMAR P, FIENBERG S. A comparison of string metrics for matching names and records[C]//KDD Workshop on Data Cleaning and Object Consolidation. California: American Association for Artificial Intelligence, 2003, 3:73-78. |
[13] | BIKEL D, CASTELLI V, FLORIAN R, et al. Entity linking and slot filling through statistical processing and inference rules[C]//Proceeding of TAC 2009 Workshop. http://www.nist.gov/tac/publications/2009/participant.papers/IBM proceedings.pdf. |
[14] | HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954. |
[15] | SILVIU Cucerzan. Large-scale named entity disambiguation based on wikipedia data[J]. Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2007: 708-716 |
[16] | HAN Xianpei, ZHAO Jun. Named entity disambiguation by leveraging wikipedia semantic knowledge[J]. Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 215-224. |
[17] | HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954. |
[18] | KULKARNI S, SINGH A, RAMAKRISHNAN G, et al. Collective annotation of Wikipedia entities in web text[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 457-466. |
[19] | LU Yiming, NIE Zaiqing, CHENG Taoyuan, et al. Name disambiguation using Web connection[C]//Proceeding of the 19th National Conference on Artificial Intelligence (AAAI-00). California: American Association for Artifical Intelligence, 2007: 56-61 |
[20] | LIU Xiaohua, ZHOU Ming, WEI Furu, et al. Joint inference of named entity recognition and normalization for tweets[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2012: 526-535. |
[21] | HONNIBAL M, DALE R. DAMSEL: The DSTO/Macquarie system for entity-linking[J]//Proceeding of TAC, 2009. http://www.nist.gov/tac/publications/2009/participant.papers/DAMSEL. proceedings.pdf. |
[22] | HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774. |
[23] | PERALTA V. Extraction and integration of movielens and imdb data[R].France: Laboratoire PRiSM, Université de Versailles, 2007. |
[24] | AUER S, BIZER C, KOBILAROV G, et al. Dbpedia: A nucleus for a web of open data[M]. Berlin: Springer, 2007: 722-735. |
[25] | SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a large ontology from wikipedia and wordnet[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2008, 6(3):203-217. |
[26] | HASSELL J, ALEMAN-MEZA B, ARPINAR I B. Ontology-driven automatic entity disambiguation in unstructured text[M]. Berlin: Springer, 2006: 44-57. |
[27] | KALASHNIKOV D V, NURAY-TURAN R, MEHROTRA S. Towards breaking the quality curse: a web-querying approach to web people search[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008: 27-34. |
[28] | HAN Xianpei, ZHAO Jun. Structural semantic relatedness: a knowledge-based method to named entity disambiguation[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL,2010: 50-59. |
[29] | HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in Web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774. |