基于RSSI的贝叶斯垃圾邮件过滤算法  

RSSI-based Bayesian anti-spam filtering algorithm

在线阅读下载全文

作  者:陈铁军[1] 靖丰年 段谊海 

机构地区:[1]郑州大学电气工程学院,河南郑州450001

出  处:《计算机工程与设计》2015年第7期1790-1793,共4页Computer Engineering and Design

基  金:教育部高等学校博士学科点专项科研基金项目(20114101110005)

摘  要:针对现有贝叶斯算法应用于垃圾邮件过滤时,贝努利模型精度低、不能区分文本特征重要性、多项式模型计算量大、无关特征项浪费计算时间、对出现次数少的特征项反应敏感等缺点,提出RSSI(remove similar and sensitive items)特征模型。通过计算并比较特征项出现的频率,去除无关和敏感特征项,减小运算量,增加正确率,减少过拟合。Matlab仿真结果表明,与现有的朴素贝叶斯算法(nave Bayes)和支持向量机(support vector machine,SVM)等算法相比,RSSI算法能显著减少分类时间,降低合法邮件被误判的概率。When Bayesian algorithm is applied in spam filtering,Bernoulli model's accuracy is low and can not distinguish the importance of text features,and the multinomial model has larger computation.In addition,it is a waste of time in calculating unrelated feature elements and this model is sensitive to low frequency elements.For these shortcomings,an improved feature extraction algorithm named RSSI was proposed,which not only reduced the amount of computation,but also improved the classification performance by calculating and comparing the occurrence frequency of feature items,so that overfitting phenomenon was reduced.Experimental results show that compared with early nave Bayes algorithm and SVM algorithm,the RSSI algorithm can significantly reduce the classification time and the probability of misjudging legitimate emails.

关 键 词:邮件分类 贝叶斯分类器 特征提取 多项式事件模型 过拟合 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象