基于多分类器组合择优方法的主题爬行分类策略  

Classification Strategy for Focus Crawling Based on Multi-classifier Combination and Ranking Approach

在线阅读下载全文

作  者:乔建忠[1] 

机构地区:[1]解放军艺术学院信息管理中心

出  处:《图书情报工作》2013年第14期114-120,共7页Library and Information Service

摘  要:针对主题爬行技术中的单一分类算法在面对多主题Web抓取和分类需求时泛化能力不强的局限,设计一种利用多种强分类算法形成的分类器组合,主题爬行器根据当前主题任务在线评估并为分类器排名,从中选择最优分类器分类的策略,并开展在多个主题抓取任务下的分类实验,比较每种分类算法的准确率和组合后的平均分类准确率以及对分类效率等评价指标的综合分析,结果证明该策略对领域局域性有所克服,普适性较强。For the limitation that generalization capacity of crawler is facing multi-topic Web crawling and classification, combination formed of multiple strong classification algorithms. online according to the current topic, and classifies Web pages single classification algorithm is not strong when focused the paper proposed a strategy of using multi-classifier The focused crawler evaluates and ranks the classifiers by selecting the better classifiers. Through classification experiments of multiple topics crawling tasks, comparing between accurate rate of each classification algorithm and average classification accurate rate of multi-classifier combination, and comprehensive analysis of the two indicators classification accuracy and classification efficiency, it proved the proposed method is better in universality, to a certain extent and overcomes the limitations of a single classifier.

关 键 词:主题爬行技术 主题爬行器 网页分类 分类算法 多分类器组合 分类准确率 分类效率 

分 类 号:G356[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象