检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西南交通大学信息科学与技术学院,成都610031
出 处:《计算机应用》2009年第10期2812-2815,共4页journal of Computer Applications
摘 要:对基于内容的垃圾邮件过滤技术尤其是特征选择算法进行了研究。在此基础上,对其中的互信息算法进行了分析,并将其与邮件过滤的特点结合起来进行,在频度、集中度及分散度三个指标上进行改进,在原互信息算法已考虑分散度的基础上,引入词频来表征频度,以类别贡献比来衡量特征对分类的贡献,即表征集中度,并给出了改进后的互信息计算公式及算法。最后使用真实邮件训练集进行了邮件分类的实验,实验结果证明对互信息算法的改进能有效提高邮件分类性能。Spare filtering techniques based on content, especially feature selection algorithm was studied. Based on that, Mutual Information (MI) algorithm, combined with the feature of spare filtering, was analyzed and improved according to frequency, divergence, and concentration. Comparing with conventional mutual information algorithm, word frequency was introduced, and ratio of mutual information was used to evaluate the contribution to classifying provided by features. The improved formula and algorithm were given. At last, simulation test with real E-mail set, was conducted, which shows that the improved mutual information algorithm provides a better result for spam classification.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145