基于基本显露模式的电子邮件分类与过滤技术  被引量:3

E-mail categorization and filtering technology based on essential emerging pattern

在线阅读下载全文

作  者:李艳[1] 范明[1] 

机构地区:[1]郑州大学计算机科学系,郑州450052

出  处:《南京大学学报(自然科学版)》2008年第5期544-550,共7页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(60773048)

摘  要:垃圾邮件问题日益严重,受到研究人员的广泛关注.基于内容分类与过滤垃圾邮件是当前解决垃圾邮件问题的主流技术之一.本文对电子邮件内容做了深入的研究,提出了一种更适合垃圾邮件分类的新的特征提取方法,并将新的特征提取方法与基于essential emerging pattern(eEP)的分类算法CeEP相结合,应用于垃圾邮件检测,实现了一种基于eEP的电子邮件分类与过滤算法(thee-mail categorization and filtering technology based on eEP,ECFEP).实验表明,新的特征提取方法与CeEP分类算法的结合是一种十分高效的分类方法,算法ECFEP的分类效率均高于目前几种较好的分类算法.The volume of junk emails on the Internet has grown tremendously in the past few years. There have been more spam volume has been more than the number of normal e-mails which is causing serious problems. Content-based filtering is one of mainstream technologies used so far. E-mail feature extraction methods mainly use text classification feature extraction methods at present. However, through analysis we found that the content of e- mail has its uniqueness. Using only text classification feature extraction methods will cause problems and reduce the efficiency of classification. The categorization methods based on emerging pattern(EP) view the samples as sets of items instead of the points in the n-dimension space. Emerging patterns (EPs) are itemsets whose supports change significantly from one data class to another. They can serve as a good classification model because they can capture the inherent distinctions between different classes of data, and represent knowledge discriminating between different classes of datasets. So EPs are useful in building accurate classifiers. The essential emerging pattern (eEP) is a special kind of EP. The eEP not only has all the virtues of EP that are very useful for constructing accurate classifiers, but also has fewer quantities that are very efficient for mining and using them. The categorization methods based on EP have an equivalent performance with C4.5 and Naive Bayes methods. The categorization methods based on EP have been applied in many fields successfully, such as DNA analysis, but we do not see the reports about applying categorization methods based on EP to e-mail categorization and filtering technology. This paper preprocesses text of the e-mail and comes up with a new spam feature extraction method which makes it more appropriate to e-mail classification the email content study and in view of the uniqueness of e-mail content. This paper use the classification algorithm by essential emerging patterns which is Data Mining researchers' new classif

关 键 词:电子邮件分类 特征提取 基本显露模式 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象