全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
电子学报  2013 

一种基于混合判定模型的复合概念抽取方法

DOI: 10.3969/j.issn.0372-2112.2013.03.012, PP. 488-495

Keywords: 语料库,领域概念,复合概念,加权词频,词条标签,位置亲和度,复合深度

Full-Text   Cite this paper   Add to My Lib

Abstract:

从大规模领域语料库中抽取领域概念,现有方法不能有效识别复合概念.本文提出一种基于混合判定模型的复合概念抽取方法,首先对文本进行分词处理,为每个词条添加词条标签,并对词条集进行噪音词消除和同义词合并处理,然后统计词条的加权词频,根据词条标签值计算位置亲和度和位置匹配度,判定和筛选可组合成复合概念的原子词条,最后通过设置不同复合深度值,实现多重复合概念抽取.采用不同规模语料库进行抽取实验,实验结果表明本文方法具有更高的召回率和准确率.

References

[1]  李善平,尹奇,胡玉杰,郭鸣,付相君.本体论研究综述[J].计算机研究与发展,2004,41(7):1041-1052. LI Shan-Ping,YIN Qi-wei,HU Yu-jie,GUO Ming,FU Xiang-jun.Overview of researches on ontology[J].Journal of Computer Research and Development,2004,41(7):1041-1052.(in Chinese)
[2]  叶育鑫,欧阳丹彤.混合语义约简和选择估值优化SPARQL[J].电子学报,2010,38(5):1205-1210. YE Yu-xin,OUYANG Dang-tong.Optimize SPARQL by combining semantic reduction and selectivity estimation[J].Acta Electronica Sinica,2010,38(5):1205-1210.(in Chinese)
[3]  陈刚,陆汝钤,金芝.基于领域知识重用的虚拟领域本体构造[J].软件学报,2003,14(3):350-355. CHEN Gang,LU Ru-Qian,JIN Zhi.Constructing virtual domain ontologies based on domain knowledge reuse[J].Journal of Software,2003,14(3):350-355.(in Chinese)
[4]  Xu Sun,Yaozhong Zhang,Takuya Matsuzaki,Yoshimasa Tsuruoka,Jun''ichi Tsujii.A discriminative latent variable chinese segmenter with hybrid word/character information [A].Proceedings of Human Language Technologies:The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics [C].Morristown,NJ USA:Association for Computational Linguistics,2009.56-64.
[5]  Ruiqiang Zhang,Keiji Yasuda,Eiichiro Sumita.Chinese word segmentation and statistical machine translation [J].ACM Transactions on Speech and Language Processing (TSLP),2008,5(2):1-19.
[6]  邱田,李鹏飞,林品.一个基于概念语义近似度的Web服务匹配算法[J].电子学报,2009,37(2):429-432. QIU Tian,LI Peng-fei,LIN Pin.A Web service matching algorithm based on semantic similarity of concepts[J].Acta Electronica Sinica,2009,37(2):429-432.(in Chinese)
[7]  李曼,王大治,杜小勇,王珊.基于领域本体的Web服务动态组合[J].计算机学报,2005,28(4):644-650. LI Man,WANG Da-zhi,DU Xiao-yong,WANG Shan.Dynamic composition of Web services based on domain ontology[J].Chinese Journal of Computers,2005,28(4):644-650.(in Chinese)
[8]  Huaping Zhang,Hongkui Yu,Deyi Xiong,Qun Liu.HHMM-based chinese lexical analyzer ICTCLAS [A].Proceedings of the Second SIGHAN Workshop on Chinese Language Processing [C].Morristown,NJ USA: Association for Computational Linguistics,2003,184-187.
[9]  崔世起,刘群,孟遥,于浩,西野文人.基于大规模语料库的新词检测[J].计算机研究与发展,2006,43(5):927-932. Cui Shi-qi,Liu Qun,Meng Yao,Yu Hao,Nishino Fumihito.New word detection based on large-scale corpus[J].Journal of Computer Research and Development,2006,43(5):927-932.(in Chinese)
[10]  Fuchun Peng,Fangfang Feng,Andrew McCallum.Chinese segmentation and new word detection using conditional random fields [A].Proceedings of the 20th International Conference on Computational Linguistics [C].Morristown,NJ USA:Association for Computational Linguistics,2004.562-568.
[11]  陈建超,郑启伦,李庆阳,严桂夺.基于词序列频率有向网的中文组合词提取算法[J].计算机应用研究,2009,26(10):3746-3749. CHEN Jian-chao,ZHENG Qi-lun,LI Qing-yang,YAN Gui-duo.Chinese combined-word detection based on directed net of word-sequence frequency[J].Application Research of Computers,2009,26(10):3746-3749.(in Chinese)
[12]  Stephen Robertson,Hugo Zaragoza,Michael Taylor.Simple BM25 extension to multiple weighted fields [A].Proceedings of the thirteenth ACM international conference on information and knowledge management (CIKM) [C].New York,USA:ACM Press,2004.42-49.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133