基于朴素贝叶斯的文化旅游文本分类技术研究  被引量:11

Classification technique of cultural tourism text based on naive Bayes

在线阅读下载全文

作  者:王祥翔 方荟 陈崇成[1] WANG Xiangxiang;FANG Hui;CHEN Chongcheng(Key Laboratory of Spatial Data Mining and information Sharing of Ministry of Education,Spatial information Research Centre of Fujian,Fuzhou University,Fuzhou,Fujian 350116,China;Fujian Provincial Key Laboratory of information Processing and Intelligent Control,Minjiang University,Fuzhou,Fujian 350108,China)

机构地区:[1]福州大学福建省空间信息工程研究中心数据挖掘与信息共享教育部重点实验室,福建福州350116 [2]闽江学院福建省信息处理与智能控制重点实验室,福建福州350108

出  处:《福州大学学报(自然科学版)》2018年第5期644-649,共6页Journal of Fuzhou University(Natural Science Edition)

基  金:福建省第二批科技创新领军人才资助项目(00387005);福建省科技计划重点资助项目(2015H0015);福建省科技型中小企业创新基金资助项目(2015C0042)

摘  要:将文本分类技术引入文化旅游文本研究,根据文化旅游文本的特点,提出一种基于朴素贝叶斯的文化旅游文本分类模型.首先构建文化专题词库,采用向量空间模型将景点描述文本转换为向量,通过信息增益进行词汇特征选择,利用词频-逆文档频率进行权重的赋值,构建分类器模型,实现旅游文本的自动分类.实验选取了1447个景点描述文本,按照闽南文化、客家文化、红色文化和生态文化进行分类,取得较好的分类效果.The authors propose a text classification model for cultural tourism text. According to the characteristics of cultural tourism text,a cultural tourism text classification model was proposed based on naive Bayes. Firstly,a cultural topics dictionary was built,scenic spot texts are represented in vectors with vector space model. Secondly,feature selection is made by information gain in order to reduce the vector dimensions,the weight for each feature in a vector was calculated by term frequency inverse document frequency. Lastly,a text classification model was established. 1447 scenic spot texts are selected as research samples which were belonged to four classes: culture of southern Fujian,Hakka culture,red culture,ecological culture. The model perform well in classification experiment.

关 键 词:文化旅游 文本分类 朴素贝叶斯 信息增益 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象