LSA和MD5算法在垃圾邮件过滤系统的应用研究  被引量:3

Research of Spam Filtering System Based on Latent Semantic Analysis and MD5

在线阅读下载全文

作  者:张秋余[1] 孙晶涛[1] 闫晓文 黄文汉[3] 

机构地区:[1]兰州理工大学计算机与通信学院,兰州730050 [2]陕西西禹高速公路有限公司,陕西韩城715400 [3]陕西理工学院计算机系,陕西汉中723003

出  处:《电子科技大学学报》2007年第6期1223-1227,共5页Journal of University of Electronic Science and Technology of China

基  金:"十一五"国家科技支撑计划(2006BAF01A21)

摘  要:随着对垃圾邮件问题的普遍关注,针对目前邮件过滤方法中存在着的语义缺失现象和处理群发型垃圾邮件低效问题,提出一种基于潜在语义分析(LSA)和信息-摘要算法5(MD5)的垃圾邮件过滤模型。利用潜在语义分析标注垃圾邮件中潜在特征词,从而在过滤技术中引入语义分析;利用MD5在LSA分析基础上,对群发型垃圾邮件生成"邮件指纹",解决过滤技术在处理群发型垃圾邮件中低效的问题。结合该模型设计了一个垃圾邮件过滤系统。采用自选数据集对文中设计的系统进行测试评估,经与Nave Bayes算法过滤器进行比较,证明该方法在垃圾邮件过滤上优于Nave Bayes方法,实验结果达到了预期的效果,验证了该方法的可行性、优越性。Along with the widespread concern of spam problem, at present, there are spam filtering system about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on Latent Semantic Analysis (LSA) and Message-Digest algorithm 5 (MD5). By making use of the LSA marks the latent feature phrase in the spam, a semantic analysis is introduced into the spam filtering technique, the "e-mail fingerprint" of multi-send spam is born with MD5 on the LSA analytical foundation, the problem of filtering technique's low effect in the multi-send spam is resolved with this kind of method. We design a spam filtering system based on this model. This system is evaluated with an optional dataset. The results obtained are compared with Naive Bayes algorithm filter experiment results. The experiments show the expected results, and the feasibility and advantage of the new spam filtering method is validated.

关 键 词:邮件指纹 特征提取 潜在语义分析 MD5算法 滑动窗口 垃圾邮件过滤 

分 类 号:TP393.098[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象