基于改进特征加权的朴素贝叶斯分类算法  被引量:28

Naive Bayes classification algorithm based on improved feature weighting

在线阅读下载全文

作  者:丁月 汪学明[1] Ding Yue;Wang Xueming(College of Computer Science&Technology,Guizhou University,Guiyang 550025,China)

机构地区:[1]贵州大学计算机科学与技术学院

出  处:《计算机应用研究》2019年第12期3597-3600,3627,共5页Application Research of Computers

基  金:国家自然科学基金资助项目([2011]61163049);贵州省自然科学基金资助项目(黔科合J字[2014]7641)

摘  要:传统朴素贝叶分类算法没有根据特征项的不同对其重要程度进行划分,使得分类结果不准确。针对这一问题,引入Jensen-Shannon(JS)散度,用JS散度来表示特征项所能提供的信息量,并针对JS散度存在的不足,从类别内与类别间的词频、文本频以及用变异系数修正过的逆类别频率这三个方面考虑,对JS散度进行调整修正,最后计算出每一特征项的权值,将权值代入到朴素贝叶斯的公式中。通过与其他算法的对比实验证明,基于JS散度并从词、文本、类别三方面改进后的朴素贝叶斯算法的分类效果最好。因此基于JS散度特征加权的朴素贝叶斯分类算法与其他分类算法相比,其分类性能有很大提高。The traditional naive Bayes classification algorithm does not divide the importance degree according to the different feature items,which makes the classification result inaccurate. In order to solve this problem,this paper introduced JensenShannon( JS) divergence and used JS divergence to express the amount of information provided by the feature terms. Aiming at the deficiency of JS divergence,the paper considered from the three aspects of word frequency,text frequency and inverse category frequency corrected by coefficient of variation,and then adjusted and corrected the JS divergence. Finally,it calculated the weight of each feature and introduced the weights into the naive Bayes formula. Compared with other algorithms,it proves that this method improves the naive Bayes classification algorithm effectively. Therefore,compared with other classification algorithms,the performance of naive Bayes classification algorithm based on JS divergence feature weighting is greatly improved.

关 键 词:文本分类 朴素贝叶斯 JS散度 词频 文本频率 类别频率 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象