All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99

ViewsDownloads

Relative Articles

More...

基于词频分析的K-Means特征聚类算法的《红楼梦》作者分析
Analysis of the Author of A Dream of Red Mansions Based on K-Means Feature Clustering Algorithm with Word Frequency

DOI: 10.12677/HJDM.2022.121008, PP. 73-79

Keywords: 词频,K-Means特征聚类算法,相似度
Word Frequency
, K-Means Feature Clustering Algorithm, Similarity

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文提出一种“基于词频分析的K-means特征聚类算法”来分析存疑文献的作者信息。以《红楼梦》为例,根据在前80回和后40回中确定的特征汉字的出现频率,用基于词频分析的K-means特征聚类算法对其分析。以每10回为一个文本,研究前、中、后四十回的相似度,从而得出《红楼梦》的前八十回与后四十回很可能并非一人所作的论断。
In this paper, “a K-means feature clustering algorithm based on word frequency analysis” is pro-posed to analyze the author information of doubtful documents. Taking A Dream of Red Mansions as an example, the K-means feature clustering algorithm based on word frequency is used to analyze it according to the occurrence frequency of characteristic Chinese characters determined in the first 80 chapters and the last 40 chapters. Taking every 10 chapters as a text, by studying the similarity of the first, middle and last 40 chapters, it is concluded that the first 80 chapters and the last 40 chapters of A Dream of Red Mansions are probably not made by one person.

References

[1]  苗怀明. 红楼梦研究史论集[M]. 沈阳: 辽宁人民出版社, 2019.
[2]  李鹏飞. 人莫不饮食也, 鲜能知味也——谈《红楼梦》与饮食文化[J]. 红楼梦学刊, 2020(4): 84-120.
[3]  王世海. 论数理统计方法研究《红楼梦》作者问题的得与失[J]. 宜春学院学报, 2019, 41(4): 105-109.
[4]  施政. 《红楼梦》研究中的统计方法综述[J]. 吉林省教育学院学报, 2019, 35(1): 151-156.
[5]  马创新, 陈小荷. 从高频词等级相关角度探析《红楼梦》作者[J]. 中文信息学报, 2018, 32(11): 97-102.
[6]  胡适. 《红楼梦考证》(改定稿)[M]. 北京: 北京出版社, 2015.
[7]  Koppel, M., Schler, J. and Argamon, S. (2009) Computational Methods in Authorship Attribution. Journal of the American Society for Information Science and Technology, 60, 9-26.
[8]  程东波, 柯小玲, 林施鑫. 基于等价性检验和特征聚类的《红楼梦》作者分析[J]. 理论数学, 2020, 10(5): 549-555.
[9]  陆尚辉. 基于R软件和KNN算法的《红楼梦》作者分析[J]. 魅力中国, 2017(7): 81+63.
[10]  施建军. 关于以《红楼梦》120 回为样本进行其作者聚类分析的可信度问题研究[J]. 红楼梦学刊, 2010(5): 318-335.
[11]  施建军. 基于支持向量机技术的《红楼梦》作者研究[J]. 红楼梦学刊, 2011(5): 35-52.
[12]  叶雷. 基于计量文体特征聚类的《红楼梦》作者分析[J]. 红楼梦学刊, 2016(5): 312-324.
[13]  李芳. K-Means算法的k值自适应优化算法研究[D]: [硕士学位论文]. 合肥: 安徽大学, 2015.
[14]  李瑞芳, 孙军波, 常诗珧. 基于计算机的《红楼梦》字词浅探[J]. 电脑知识与技术, 2009(5): 753-755.
[15]  叶雷. 基于计量文体特征聚类的《红楼梦》作者分析[J]. 红楼梦学刊, 2016(5): 312-324.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413