全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于机器学习的景点评论文本分析
Text Analysis of Scenic Spot Comments Based on Machine Learning

DOI: 10.12677/SA.2022.112042, PP. 388-401

Keywords: 旅游大数据,机器学习,游客评论,文本分类,LDA主题聚类模型
Tourism Big Data
, Machine Learning, Tourist Comments, Text Classification, LDA Topic Clustering Model

Full-Text   Cite this paper   Add to My Lib

Abstract:

使用网络爬虫技术获取了旅游网站游客在线评论作为数据源,通过Python语言对数据进行数据清洗、中文分词、文本向量化,对完成预处理的数据作了描述性统计分析;建立了朴素贝叶斯(NB)、逻辑回归(LR)两个传统机器学习文本分类模型和长短期记忆网络(LSTM)深度学习模型,利用深度学习模型LSTM进行分类的准确率为92.15%,高于传统机器学习模型中准确率最高的LR约2.6个百分点。使用LSTM模型对评论文本进行分类并对完成分类的数据构建了LDA主题聚类模型挖掘潜在主题,提取不同主题对应的特征词进行对比分析,得出结论:负面评论对山海关景区基础设施、收费管理感到不满意;正面评论对山海关景区的历史文化底蕴、体验感受、景点服务以及景点趣味性都很满意。基于从评论文本中挖掘的信息,旨在提取游客关注点与需求,为潜在消费者提供消费选择,为景点管理部门提供营销决策。
Web crawler technology is used to obtain online comments of tourists from tourist websites as data sources. Data cleaning, Chinese word segmentation and text vectorization are carried out on the data by Python language, and descriptive statistical analysis is made on the preprocessed data. Two traditional machine learning text classification models, Naive Bayes (NB) and Logistic Regression (LR), and Long-term and Short-term Memory Network (LSTM) deep learning model are established. The classification accuracy rate of LSTM is 92.15%, which is about 2.6 percentage points higher than LR, the highest accuracy rate in traditional machine learning model. The LSTM model is used to classify the comment text, and LDA topic clustering model is constructed for the classified data to mine potential topics, and the feature words corresponding to different topics are extracted for comparative analysis. The conclusion is that negative comments are not satisfied with the infrastructure and charge management of Shanhaiguan scenic spot; the positive comments are very satisfied with the historical and cultural heritage, experience, scenic service and interest of Shanhaiguan scenic spot. Based on the information mined from the comment text, it aims to extract the concerns and needs of tourists, provide potential consumers with consumption choices and provide scenic spot management departments with marketing decisions.

References

[1]  Tan, S.B. and Zhang, J. (2007) An Empirical Study of Sentiment Analysis for Chinese Documents. Expert Systems with Applications, 34, 2622-2629.
https://doi.org/10.1016/j.eswa.2007.05.028
[2]  刘志明, 刘鲁. 基于机器学习的中文微博情感分类实证研究[J]. 计算机工程与应用, 2012, 48(1): 1-4.
[3]  周咏梅, 阳爱民, 杨佳能. 一种新闻评论情感词典的构建方法[J]. 计算机科学, 2014, 41(8): 67-69+80.
[4]  魏慧玲. 文本情感分析在产品评论中的应用研究[D]: [硕士学位论文]. 北京: 北京交通大学, 2014.
[5]  郭小芬, 刘聪, 李炜. SVM在中文广告分类中的应用[J]. 电信技术, 2017(10): 73-76.
[6]  丁照银. 基于机器学习的评论文本分析[D]: [硕士学位论文]. 芜湖: 安徽师范大学, 2019.
[7]  应昊东. 基于文本挖掘的新能源汽车评论情感分析研究及应用[D]: [硕士学位论文]. 上海: 东华大学, 2021.
[8]  戴维. 逻辑回归解决文本分类问题[J]. 通讯世界, 2018(8): 266-267.
[9]  孙晓东, 倪荣鑫. 中国邮轮游客的产品认知、情感表达与品牌形象感知——基于在线点评的内容分析[J]. 地理研究, 2018, 37(6): 1159-1180.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133