基于MapReduce的朴素贝叶斯算法文本分类方法  被引量:7

Text Classification Method of Naive Bayes Algorithm Based on MapReduce

在线阅读下载全文

作  者:张晨跃 刘黎志[1] 邓开巍 刘杰[1] ZHANG Chenyue;LIU Lizhi;DENG Kaiwei;LIU Jie(Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology),Wuhan 430205,China)

机构地区:[1]智能机器人湖北省重点实验室(武汉工程大学),湖北武汉430205

出  处:《武汉工程大学学报》2021年第1期102-105,共4页Journal of Wuhan Institute of Technology

基  金:2017年度湖北省教育厅科学研究计划指导性项目(B2017051)。

摘  要:为了解决传统串行朴素贝叶斯算法分类性能低下的问题,提出一种基于朴素贝叶斯算法的并行化分类方法。选取多项式朴素贝叶斯,搭建Hadoop集群,通过卡方检验选取特征词,利用词频-逆文本频率指数方法计算出每个特征项的权值,并求出每类的权重总和,将权值应用到朴素贝叶斯公式中得到分类结果。实验结果表明:在该集群上设计的并行化朴素贝叶斯分类方法较比传统朴素贝叶斯方法,其精确率,召回率,F1值分别至少提高了7.66%,7.56%,11.98%,且用时更短,说明本文的方法能够提高处理文本的时间效率。To solve the problem of low classification performance of traditional serial Naive Bayesian algorithms,a parallelized Naive Bayesian classification method was proposed.Polynomial Naive Bayesian was selected and Hadoop cluster was built.First,we selected feature words by the chi-square test.Then,we computed weights of each feature word and sum of weights of each categories by the Term Frequency-inverse document frequency approach.Finally,the weighs were applied to Naive Bayesian formula to get the classification results.Experimental results show that the accuracy,recall and F1 value of the proposed approach are 7.66%,7.56%and 11.98%higher than those of the traditional Naive Bayes method,respectively.Furthermore,the time is shorter,which shows that the method can improve the time efficiency of text processing.

关 键 词:朴素贝叶斯 分类 并行化 MAPREDUCE 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象