邮件过滤中特征选择算法的研究及改进  被引量:8

Improvement of feature selection method in spam filtering

在线阅读下载全文

作  者:卢扬竹[1] 张新有[1] 祁玉[1] 

机构地区:[1]西南交通大学信息科学与技术学院,成都610031

出  处:《计算机应用》2009年第10期2812-2815,共4页journal of Computer Applications

摘  要:对基于内容的垃圾邮件过滤技术尤其是特征选择算法进行了研究。在此基础上,对其中的互信息算法进行了分析,并将其与邮件过滤的特点结合起来进行,在频度、集中度及分散度三个指标上进行改进,在原互信息算法已考虑分散度的基础上,引入词频来表征频度,以类别贡献比来衡量特征对分类的贡献,即表征集中度,并给出了改进后的互信息计算公式及算法。最后使用真实邮件训练集进行了邮件分类的实验,实验结果证明对互信息算法的改进能有效提高邮件分类性能。Spare filtering techniques based on content, especially feature selection algorithm was studied. Based on that, Mutual Information (MI) algorithm, combined with the feature of spare filtering, was analyzed and improved according to frequency, divergence, and concentration. Comparing with conventional mutual information algorithm, word frequency was introduced, and ratio of mutual information was used to evaluate the contribution to classifying provided by features. The improved formula and algorithm were given. At last, simulation test with real E-mail set, was conducted, which shows that the improved mutual information algorithm provides a better result for spam classification.

关 键 词:垃圾邮件 文本分类 特征选择 互信息 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象