基于改进朴素贝叶斯算法的垃圾邮件过滤器的研究  被引量:27

Implementing Spam Filter by Improving Naive Bayesian Algorithm

在线阅读下载全文

作  者:郑炜[1] 沈文[1] 张英鹏[2] 

机构地区:[1]西北工业大学软件与微电子学院,陕西西安710072 [2]西安财经学院信息学院,陕西西安710072

出  处:《西北工业大学学报》2010年第4期622-627,共6页Journal of Northwestern Polytechnical University

摘  要:基于朴素贝叶斯算法的垃圾邮件过滤器是目前比较高效、经济的垃圾邮件过滤技术之一,它已经广泛应用到垃圾邮件过滤领域。文章在对朴素贝叶斯过滤器分析的基础上,针对朴素贝叶斯算法的缺陷结合损失最小化的思想,并根据垃圾邮件的特性对朴素贝叶斯算法做了改进,提出了改进朴素贝叶斯算法,该算法能够通过调整k值,降低合法邮件被错判为垃圾邮件的概率,从而最大程度减少用户的损失。Our aim is to decrease the probability under which the spam filter misjudges legal e-mail as spam by adjusting the k value of the naive Bayesian algorithm,thus minimizing Internet users' economic loss.Section 1 of the full paper analyzes the classification deficiencies of the naive Bayesian algorithm.Section 2 implements the spam filter by improving the naive Bayesian algorithm through obtaining the k value as shown in eq.(8).Section 3 tested the spam filter by adjusting the k value of our improved Bayesian algorithm;the test results,presented in Table 2,and their comparison,given in Figs.1,2 and 3,show preliminarily that the spam filter that uses our improved Bayesian algorithm can increase the recall rate by 10% and the accuracy by 5%,thus effectively decreasing the probability of misjudging legal e-mails as spams.

关 键 词:概率 朴素贝叶斯 垃圾邮件过滤器 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象