检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]兰州理工大学计算机与通信学院,兰州730050 [2]陕西西禹高速公路有限公司,陕西韩城715400 [3]陕西理工学院计算机系,陕西汉中723003
出 处:《电子科技大学学报》2007年第6期1223-1227,共5页Journal of University of Electronic Science and Technology of China
基 金:"十一五"国家科技支撑计划(2006BAF01A21)
摘 要:随着对垃圾邮件问题的普遍关注,针对目前邮件过滤方法中存在着的语义缺失现象和处理群发型垃圾邮件低效问题,提出一种基于潜在语义分析(LSA)和信息-摘要算法5(MD5)的垃圾邮件过滤模型。利用潜在语义分析标注垃圾邮件中潜在特征词,从而在过滤技术中引入语义分析;利用MD5在LSA分析基础上,对群发型垃圾邮件生成"邮件指纹",解决过滤技术在处理群发型垃圾邮件中低效的问题。结合该模型设计了一个垃圾邮件过滤系统。采用自选数据集对文中设计的系统进行测试评估,经与Nave Bayes算法过滤器进行比较,证明该方法在垃圾邮件过滤上优于Nave Bayes方法,实验结果达到了预期的效果,验证了该方法的可行性、优越性。Along with the widespread concern of spam problem, at present, there are spam filtering system about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on Latent Semantic Analysis (LSA) and Message-Digest algorithm 5 (MD5). By making use of the LSA marks the latent feature phrase in the spam, a semantic analysis is introduced into the spam filtering technique, the "e-mail fingerprint" of multi-send spam is born with MD5 on the LSA analytical foundation, the problem of filtering technique's low effect in the multi-send spam is resolved with this kind of method. We design a spam filtering system based on this model. This system is evaluated with an optional dataset. The results obtained are compared with Naive Bayes algorithm filter experiment results. The experiments show the expected results, and the feasibility and advantage of the new spam filtering method is validated.
关 键 词:邮件指纹 特征提取 潜在语义分析 MD5算法 滑动窗口 垃圾邮件过滤
分 类 号:TP393.098[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222