|
福州大学学报(自然科学版) 2018
基于朴素贝叶斯的文化旅游文本分类技术研究
|
Abstract:
将文本分类技术引入文化旅游文本研究,根据文化旅游文本的特点,提出一种基于朴素贝叶斯的文化旅游文本分类模型. 首先构建文化专题词库,采用向量空间模型将景点描述文本转换为向量,通过信息增益进行词汇特征选择,利用词频-逆文档频率进行权重的赋值,构建分类器模型,实现旅游文本的自动分类. 实验选取了1447个景点描述文本,按照闽南文化、客家文化、红色文化和生态文化进行分类,取得较好的分类效果.
The authors propose a text classification model for cultural tourism text. According to the characteristics of cultural tourism text,a cultural tourism text classification model was proposed based on naive Bayes. Firstly,a cultural topics dictionary was built,scenic spot texts are represented in vectors with vector space model. Secondly,feature selection is made by information gain in order to reduce the vector dimensions,the weight for each feature in a vector was calculated by term frequency inverse document frequency. Lastly,a text classification model was established. 1447 scenic spot texts are selected as research samples which were belonged to four classes: culture of southern Fujian,Hakka culture,red culture,ecological culture. The model perform well in classification experiment