基于类别特征选择与反馈学习随机森林算法的邮件过滤系统研究  被引量:1

ON EMAIL FILTERING SYSTEM BASED ON CATEGORY FEATURE SELECTION AND FEEDBACK LEARNING RANDOM FOREST ALGORITHM

在线阅读下载全文

作  者:孙雪[1] 韩蕾[1] 李昆仑[2] 

机构地区:[1]河北大学工商学院,河北保定071000 [2]河北大学电信学院,河北保定071002

出  处:《计算机应用与软件》2015年第4期67-71,共5页Computer Applications and Software

基  金:国家自然科学基金项目(60773062;61073121);河北省科技支撑计划项目(072135188);河北大学青年基金项目(2010Q17)

摘  要:针对邮件过滤系统中普遍存在的维数灾难、类别主题差异和反馈信息缺失问题,提出一种基于类别特征选择与反馈学习随机森林算法的邮件过滤模型。该方法将隐含的Dirichlet模型引入到邮件的特征选择环节,在不同类型的邮件集中建立各自的生成模型,分别搜寻构成各个主题的特征信息,有效降低冗余信息和噪声数据对分类性能的影响。反馈学习随机森林算法发挥了决策树集成与反馈学习的优势,实现邮件过滤系统的自我调节,及时捕捉垃圾邮件的变化趋势。在公开的语料库CCERT和Trec06上进行测试,并与典型算法进行比较,实验结果表明所提算法的可行性和有效性。To solve the problems of "curse of dimensionality","diversity in the categories topic"and"lack of feedback"commonly exis-ted in email filtering system,we propose an email filtering method which is based on category feature selection and feedback learning random forest algorithm.It introduces the latent Dirichlet allocation (LDA)model to the feature selection link of email and builds the respective gen-eration model in different type of email sets to search separately the feature information forming each subject,thus effectively reduces the im-pacts of redundant information and noise data on classification performance.The feedback learning random forest algorithm plays to the advan-tages of decision trees integration and feedback learning,realises the self-regulation of the email filtering system and can catch the change trend in spam promptly.The test is done on open corpus CCERT and Trec06,and the comparison is made with typical algorithm as well.Ex-perimental results demonstrate the availability and effectiveness of the proposed algorithm.

关 键 词:LDA模型 特征选择 反馈学习 随机森林算法 垃圾邮件过滤 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象