基于IMI-WNB算法的垃圾邮件过滤技术研究  被引量:3

Research on Spam Filtering Technology Based on IMI-WNB Algorithm

在线阅读下载全文

作  者:刘洁[1] 王铮 王辉[1] LIU Jie;WANG Zheng;WANG Hui(School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo,Henan 454000,China)

机构地区:[1]河南理工大学计算机科学与技术学院,河南焦作454000

出  处:《计算机工程》2020年第12期299-304,312,共7页Computer Engineering

基  金:国家自然科学基金(61300216)。

摘  要:互信息和朴素贝叶斯算法应用于垃圾邮件过滤时,存在特征冗余和独立性假设不成立的问题。为此,提出一种改进互信息的加权朴素贝叶斯算法。针对互信息效率较低的问题,通过引入词频因子与类间差异因子,提出一种改进的互信息特征选择算法,从而实现更高效的特征降维。针对朴素贝叶斯分类算法的独立性假设问题,在朴素贝叶斯分类时使用改进互信息值进行特征加权,消除部分朴素贝叶斯条件独立性假设对邮件分类的不利影响。实验结果表明,相比传统朴素贝叶斯算法,该算法提高了垃圾邮件过滤的精确度、召回率与稳定性。The application of Mutual Information(MI)and Naive Bayes(NB)algorithm to spam filtering is faced with feature redundancy and invalid independence assumption.To address the problem,this paper proposes an Improved Mutual Information-Weighted Naive Bayes(IMI-WNB)algorithm.As for the low efficiency of mutual information,an improved feature selection algorithm based on MI is proposed by introducing the word frequency factor and inter-class difference factor in order to achieve more efficient feature dimensionality reduction.To solve the problem of independence assumption of NB classification algorithm,the Improved Mutual Information(IMI)value is used for feature weighting in NB classification,which eliminates the adverse effect of part of the NB conditional independence assumption on mail classification.The experimental results show that compared with the traditional NB algorithm,the proposed algorithm improves the accuracy,recall rate and stability of spam filtering.

关 键 词:互信息 垃圾邮件过滤 加权朴素贝叶斯算法 特征选择 词频 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象