全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于动词依存集的领域概念聚类方法

DOI: 10.3969/j.issn.1006-7043.201403012

Keywords: 聚类方法, 语料库, 动词依存集, 依存句法分析, 领域概念, 概念重合率

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了能在小规模特定领域语料库上进行有效的概念聚类,提出了一种基于动词依存集的领域概念聚类方法。根据同类领域概念与特定的领域动词共现这一特征,在领域专家的辅助下制定动词依存集,通过计算在主谓结构和动宾结构中与动词依存集共现的概念动词依存度,将依存度高于阈值的概念聚为一类。实验证明,该方法在小规模特定领域语料库上较为实用,聚类结果的概念重合率优于基于LSI和基于搜索引擎的概念聚类方法。

References

[1]  BELLEGARDA J R, BUTZBERGER J W, CHOW Y L, et al. A novel word clustering algorithm based on latent semantic analysis[C]//Proceedings of the Acoustics, Speech, and Signal Processing, 1996. IEEE, Atlanta, Georgia, 1996: 172-175.
[2]  哈工大社会计算与信息检索研究中心. 基于云计算技术的中文自然语言处理服务平台[EB/OL]. (2014-02-11). http://www.ltp-cloud.com/demo/.
[3]  HARRIS Z S. Mathematical structures of language[M]. Florida: Krieger Pub Co, 1968.
[4]  TURNEY P D. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL[C]//Proceedings of the 12th European Conference on Machine Learning. arXiv preprint cs. Freiburg, Germany, 2001: 491-502.
[5]  TUFI? D, ION R, IDE N. Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets[C]//Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics. Geneva, Switzerland, 2004: 1312.
[6]  JIN P, SUN X, WU Y, et al. Word clustering for collocation-based word sense disambiguation [C]//Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 2007: 267-274.
[7]  陈炯, 张永奎. 一种基于词聚类的中文文本主题抽取方法[J]. 计算机应用, 2005, 25(4): 754-756. CHEN Jiong, ZHANG Yongkui. Novel Chinese text subject extraction method based on word clustering[J]. Journal of Computer Applications, 2005, 25(4): 754-756.
[8]  CHEN W L, CHANG X Z, WANG H Z, et al. Automatic word clustering for text categorization using global information[C]//Information Retrieval Technology. Berlin: Springer, 2005: 1-11.
[9]  DHILLON I S, MALLELA S, KUMAR R. Enhanced word clustering for hierarchical text classification[C]//Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Edmonton, Canada, 2002: 191-200.
[10]  MOMTAZI S, KLAKOW D. A word clustering approach for language model-based sentence retrieval in question answering systems[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM. Hong Kong, China, 2009: 1911-1914.
[11]  郭怀恩, 朱礼军, 徐硕. 词聚类技术研究综述[J]. 数字图书馆论坛, 2010(5): 15-19. GUO Huaien, ZHU Lijun, XU Shuo. A survey on word clustering technique[J]. Digital Library Forum, 2010(5): 15-19.
[12]  闻扬, 苑春法. 基于搭配对的汉语形容词―名词聚类[J]. 中文信息学报, 2000, 14(6): 45-50. WEN Yang, YUAN Chunfa. Clustering of Chinese adjectives-nouns based on compositional pairs[J]. Journal of Chinese Information Processing, 2000, 14(6): 45-50.
[13]  WANG B, WANG H. A comparative study on Chinese word clustering[C]//Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. Berlin: Springer, 2006: 157-164.
[14]  FARHAT A, ISABELLE J F, O’SHAUGHNESSY D. Clustering words for statistical language models based on contextual word similarity[C]// 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, Georgia, 1996: 180-183.
[15]  DUDA R O, HART P E, STORK D G. Pattern classification[M]. 2nd ed. Hoboken: John Wiley & Sons, 2000.
[16]  JAIN A K, DUBES R C. Algorithms for clustering data[M]. Englewood Cliffs: Prentice hall, 1988.
[17]  MATSUO Y, SAKAKI T, UCHIYAMA K, et al. Graph-based word clustering using a web search engine[C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Sydney, Australia, 2006: 542-550.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133