|
- 2016
微博城市投诉文本中的地理位置实体识别
|
Abstract:
摘要: 微博投诉文本中地理位置实体通常存在结构复杂,长度较长,描述较详细的特点。通过对投诉微博文本的分析,提出了地理位置实体自动识别的方法。该方法首先利用特征资源库对微博进行特征标注,使用条件随机场(conditional random fields, CRF)模型识别地理位置实体。其次根据微博和地理位置实体的特点,对CRF识别后的数据进行二次标注。最后利用微博规则库对识别结果进行补召,修正地理位置实体,最终实现地理位置实体的识别。实验结果表明该方法有显著效果,F值可达到85.52%。
Abstract: Geographical entity in city complaints of Micro-blog has usually has the characteristics of complicated structure, long length, the location of detailed description. This paper presents an automatic method to recognize geographical entities through analysis complaints of Micro-blog. First of all, the method utilizes the feature repository of Micro-blog to mark features, using the conditional random field(CRF)model to identify the geographical entities. Second, according to the characteristics of Micro-blog and geographical entity, recognized data by CRF is second marked. Third, rule bank is utilized to supplementing the recognition result and correcting geographical entities, consequently, the recognition of geographical entities are implemented. At last, Experimental results on the proposed method proved to have an F-Score of 85.52%
[1] | 唐旭日,陈小荷,许超,等. 基于篇章的中文地名识别研究[J]. 中文信息学报,2010,24(2):24-32. TANG Xuri, CHEN Xiaohe, XU Chao, et al. Discourse-based Chinese location name recognition[J]. Journal of Chinese Information Processing, 2010, 24(2):24-32. |
[2] | 李丽双,黄德根,陈春荣,等. 用支持向量机进行中文地名识别的研究[J]. 小型微型计算机系统,2005,26(8):1416-1419. LI Lishuang, HUANG Degen, CHEN Chunrong, et al. Research on method of automatic recognition of Chinese Place names based on support vector machines[J]. Mini-micro Systems, 2005, 26(8):1416-1419. |
[3] | 冯元勇,孙乐,张大鲲,等. 基于小规模尾字特征的中文命名实体识别研究[J]. 电子学报,2008,36(9):1833-1838. FENG Yuanyong, SUN Le, ZHANG Dakun, et al. Study on the chinese named entity recognition using small scale character tail hints[J]. Acta Electronica Sinica, 2008, 36(9):1833-1838. |
[4] | 蔡华丽,刘鲁,李红. 基于规则推理的突发事件发生地点识别研究[J].情报学报,2011,30(2):219-224. CAI Huali, LIU Lu, LI Hong. Rule Reasoning-based occurring place recognition for unexpected event[J]. Journal of the China Society for Scientific Andtechnical Information, 2011, 30(2):219-224. |
[5] | 鞠久朋,张伟伟,宁建军,等. CRF与规则相结合的地理空间命名实体识别[J].计算机工程,2011,37(7):210-212,215. JU Jiupeng, ZHANG Weiwei, NING Jianjun, et al. Geospatial named entities recognition using combination rules[J]. Computer Engineering, 2011, 37(7):210-212,215. |
[6] | LIU X, ZHANG S, WEI F, et al. Recognizing named entities in tweets[C] //Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Somerset: ACL, 2011, 1:359-367. |
[7] | LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C] //Proceedings of the 8th International Conference of Machine Learning.New York:ACM, 2001:282-289. |
[8] | BOYD D, ELLISON N B. Social network sites: Definition history and scholarship[J]. Journal of Computer Mediated Communication, 2007, 13(1):210-230. |
[9] | LEE R, WAKAMIYA S, SUMIYA K. Discovery of unusual regional social activities using geo-tagged Microblogs[J]. World Wide Web, 2011,14(4):321-349. |
[10] | 钱晶,张玥杰,张涛. 基于最大熵的汉语人名地名识别方法研究[J].小型微型计算机系统,2006,27(9):1761-1765. QIAN Jing, ZHANG Yuejie, ZHANG Tao. Research on Chinese person name and location name recognition based on maximum entropy model[J]. Mini-micro Systems, 2006, 27(9):1761-1765. |
[11] | LI C, WENG J, HE Q, et al. TwiNER: named entity recognition in targeted twitter stream[C] //Proceedings of the 35th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR 2012).New York:ACM, 2012:721-730. |
[12] | RITTER A, CLARK S, ETZIONI O. Named entity recognition in tweets: an experimental study[C] //Proceedings of the Conference on Empirical Methods in Natural Language Processing. Somerset: ACL, 2011:1524-1534. |