朴素Bayes分类器文本特征向量的参数优化被引量：4

Parameter Optimization of Text Feature Vector of Na?ve Bayesian Classifier

作　　者：方秋莲[1] 王培锦隋阳郑涵颖吕春玥王艳彤 FANG Qiulian;WANG Peijin;SUI Yang;ZHENG Hanying;LV Chunyue;WANG Yantong(School of Mathematics and Statistics,Central South University,Changsha 410083,China)

机构地区：[1]中南大学数学与统计学院

出　　处：《吉林大学学报（理学版）》2019年第6期1479-1484,共6页Journal of Jilin University:Science Edition

基　　金：湖南省统计科研项目(批准号:2018A01);全国大学生创新创业项目(批准号:S20190533497)

摘　　要：采用朴素Bayes算法建立中文文本自动分类器,并研究相关参数的选择问题,以实现中文文本的高效分类.首先在模型训练阶段,采用N-gram模型处理训练数据集提取特征向量;然后使用朴素Bayes算法建立文本分类器;最后在模型测试阶段,为提高分类准确率,使用词频-反文档频率算法对测试样本进行特征向量提取.实例分析结果表明,在提取训练集特征向量时,2-gram模型和4-gram模型的特征提取效果最佳;在选取特征向量长度时,长度为25000的特征向量可使分类准确率出现最大增幅并保证较高准确率;在确定特征项词性方面,同时选取动词和名词可使分类器准确率达到最高,仅选取动词时准确率最低.Naive Bayesian algorithm was used to build an automatic Chinese text classifier,and the selection of relevant parameter was studied to realize the efficient classification of Chinese text.Firstly,in model training stage,N-gram model was used to extract feature vectors from training data sets.Secondly,Na ve Bayesian algorithm was used to build a text classifier.Finally,in model testing stage,in order to improve the classification accuracy,term frequency-inverse document frequency algorithm was used to extract feature vectors of the test samples.The results show that when extracting feature vectors from training sets,2-gram model and 4-gram model have the best effect of feature extraction;when selecting the length of feature vectors,the length of 25000can make the greatest increment of classification accuracy and ensure a higher accuracy;when determining the characteristic of feature items,the accuracy is the highest when both verbs and nouns are selected,and the lowest when only verbs are selected.

关键词：朴素Bayes分类器特征选择 TFIDF算法 N-GRAM模型

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

朴素Bayes分类器文本特征向量的参数优化被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

朴素Bayes分类器文本特征向量的参数优化 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

朴素Bayes分类器文本特征向量的参数优化被引量：4