%0 Journal Article %T 基于机器学习的景点评论文本分析
Text Analysis of Scenic Spot Comments Based on Machine Learning %A 郑明明 %A 王知人 %A 谢璐妍 %J Statistics and Applications %P 388-401 %@ 2325-226X %D 2022 %I Hans Publishing %R 10.12677/SA.2022.112042 %X 使用网络爬虫技术获取了旅游网站游客在线评论作为数据源,通过Python语言对数据进行数据清洗、中文分词、文本向量化,对完成预处理的数据作了描述性统计分析;建立了朴素贝叶斯(NB)、逻辑回归(LR)两个传统机器学习文本分类模型和长短期记忆网络(LSTM)深度学习模型,利用深度学习模型LSTM进行分类的准确率为92.15%,高于传统机器学习模型中准确率最高的LR约2.6个百分点。使用LSTM模型对评论文本进行分类并对完成分类的数据构建了LDA主题聚类模型挖掘潜在主题,提取不同主题对应的特征词进行对比分析,得出结论:负面评论对山海关景区基础设施、收费管理感到不满意;正面评论对山海关景区的历史文化底蕴、体验感受、景点服务以及景点趣味性都很满意。基于从评论文本中挖掘的信息,旨在提取游客关注点与需求,为潜在消费者提供消费选择,为景点管理部门提供营销决策。
Web crawler technology is used to obtain online comments of tourists from tourist websites as data sources. Data cleaning, Chinese word segmentation and text vectorization are carried out on the data by Python language, and descriptive statistical analysis is made on the preprocessed data. Two traditional machine learning text classification models, Naive Bayes (NB) and Logistic Regression (LR), and Long-term and Short-term Memory Network (LSTM) deep learning model are established. The classification accuracy rate of LSTM is 92.15%, which is about 2.6 percentage points higher than LR, the highest accuracy rate in traditional machine learning model. The LSTM model is used to classify the comment text, and LDA topic clustering model is constructed for the classified data to mine potential topics, and the feature words corresponding to different topics are extracted for comparative analysis. The conclusion is that negative comments are not satisfied with the infrastructure and charge management of Shanhaiguan scenic spot; the positive comments are very satisfied with the historical and cultural heritage, experience, scenic service and interest of Shanhaiguan scenic spot. Based on the information mined from the comment text, it aims to extract the concerns and needs of tourists, provide potential consumers with consumption choices and provide scenic spot management departments with marketing decisions. %K 旅游大数据,机器学习,游客评论,文本分类,LDA主题聚类模型
Tourism Big Data %K Machine Learning %K Tourist Comments %K Text Classification %K LDA Topic Clustering Model %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=50466