改进多分类器集成AdaBoost算法的Web主题分类  被引量:2

WEB TOPIC CLASSIFICATION BASED ON MODIFIED MULTI-CLASSIFIER INTEGRATION ADABOOST ALGORITHM

在线阅读下载全文

作  者:伍杰华[1] 倪振声[2] 

机构地区:[1]广东工贸职业技术学院计算机工程系,广东广州510510 [2]中山大学信息科学与技术学院,广东广州510006

出  处:《计算机应用与软件》2013年第11期64-67,共4页Computer Applications and Software

基  金:国家自然科学基金项目(61003045)

摘  要:现有的Web主题分类算法一般基于单一模型构建或者仅仅把多个单一模型简单叠加进行决策。针对该问题,提出一种基于多分类器集成的改进AdaBoost算法的Web主题分类方法。算法先采用VIPS算法获取页面分块并获取其视觉特征和文本特征,根据每一类特征的维度分别训练弱分类器,然后计算其对应的错误率,修改错误判别的拒绝策略,从而针对不同特征产生相应的最优分类器,最后对两类最优分类器级联决策。实验结果表明,该方法能提高AdaBoost算法对复杂Web主题信息的分类准确率,同时也为Web主题分类领域的研究提供一种新的方案。Current Web topic classification algorithms are generally constructed based on single model or merely superimpose the multiple single model for decision-making. In light of the problem, we propose a new Web topic classification method which is based on the modified multi-classifier integration AdaBoost algorithm. Firstly, the method uses VIPS algorithm to acquire page blocks as well as their visual and text features, and trains weak classifier on the basis of the dimension of each feature; then, the algorithm calculates its corresponding error rate and modifies the refusal strategies of error discrimination, so that generates the corresponding optimal classifier for different features ; finally it performs cascading decision-making on two kind of optimal classifiers. Experimental results demonstrate that the method can improve the classification precision of AdaBoost on complex Web topic information, and at the same time it also provides a kind of new scheme for research on Web topic classification field.

关 键 词:WEB主题 ADABOOST 分类器 分类集成 特征分类 主题切分 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象