检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:丁华福[1] 王莹莹[1] 韩咏[2] 闵莉[2] 邹钰[2]
机构地区:[1]哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150080 [2]黑龙江工程学院计算机科学与技术学院,黑龙江哈尔滨150050
出 处:《黑龙江工程学院学报》2012年第2期65-69,共5页Journal of Heilongjiang Institute of Technology
基 金:黑龙江省教育厅科学技术研究(面上)项目(12511444)
摘 要:基于机器学习的垃圾邮件过滤技术是当前垃圾邮件过滤的主流方法。机器学习模型主要分为两类:以朴素贝叶斯(NB)为代表的生成模型和以逻辑回归模型(LR)、支持向量机模型(SVM)为代表的判别学习模型。以往对两种模型的研究都是针对某一种语言进行,对于模型的语言独立性与相关性研究较少。因此,在中文数据集和英文数据集上比较典型的生产模型和判别学习模型的过滤性能。比较Bogo(Bogo系统是基于贝叶斯算法的,它是典型的生成模型)、逻辑回归模型和松弛在线支持向量机(两种典型的判别学习模型)在中英文数据集上的过滤性能。其中:实验是在公开英文数据集TREC05p-1、TREC06p和公开中文数据集TREC06c、SEWM2011上进行。实验结果显示基于判别模型垃圾邮件过滤器性能明显优于基于生成模型,并且相同的模型在中文数据集上显示了较好的效果。The model of spam filter which bases on machine learning is the main method of model of spam filter. Machine learning model is divided into two categories: the generative model which is representative by Naive Bayes and the discriminative model which is representative by Logistic Regression (LR) and Sup- port Vector Machine (SVM). Previous studies of two models are on a certain language, the studies of the independence of the language are less. Therefore, the article compared the performance of typical repre- sentative model and discriminative model on Chinese data set and English data set. The article compared the performance of Bogo which is generative model and Logistic Regression, Relaxed Online SVM which are two discriminative model. We choose the public English datasets: TREC05p-1, TREC06p; Public Chi- nese datasets: TREC06c, SEWM 2011, as the test dataset with immediate feedback. The discriminative model gives the better results than the generative model based on spam filter. And the same model gives the better results on the Chinese datasets. ROSVM gives the best performance on Chinese spam filter.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30