检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:顾玮[1]
出 处:《办公自动化》2018年第1期55-57,共3页Office Informatization
摘 要:分析了垃圾邮件内容过滤技术,认识到垃圾邮件过滤技术与普通的文本分类和挖掘问题存在着很多不同。从邮件结构不同于普通文本出发,对基于贝叶斯的过滤方法进行了一系列改进,提出一种阈值调整算法,设计了集成加权模型,以充分利用邮件的结构信息。基于集成加权模型对邮件头和邮件正文分别建立模型,最后通过加权方法集成二者结果,对垃圾邮件进行过滤。通过在改进和扩展而设计的贝叶斯过滤器在最新的标准数据集上的测试结果表明,与经典的贝叶斯过滤器Bogo相比,过滤效果有较大的提高。This paper analyzes the content filtering technology of spam and realizes that there are many differences be-tween spam filtering technology and common text classification and based on bayes is improved, and the integrated weighted model is mining problems. In this paper, the filtering method designed to make full use of the structure information of the mail. Based on the integrated weighted model, a model was established for the mail head and the body of the mail, and finally, the results were integrated with the weighted method to filter the spare. By designed to improve and expand the bayesian filter in the latest standard data sets on the test results show that compared with the classical bayesian filter Bogo, filtering effect has great improvement.
关 键 词:集成加权贝叶斯 最小风险贝叶斯 主动学习贝叶斯 特征选择 阈值调整
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222