检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:乔建忠[1]
机构地区:[1]解放军艺术学院信息管理中心
出 处:《图书情报工作》2013年第14期114-120,共7页Library and Information Service
摘 要:针对主题爬行技术中的单一分类算法在面对多主题Web抓取和分类需求时泛化能力不强的局限,设计一种利用多种强分类算法形成的分类器组合,主题爬行器根据当前主题任务在线评估并为分类器排名,从中选择最优分类器分类的策略,并开展在多个主题抓取任务下的分类实验,比较每种分类算法的准确率和组合后的平均分类准确率以及对分类效率等评价指标的综合分析,结果证明该策略对领域局域性有所克服,普适性较强。For the limitation that generalization capacity of crawler is facing multi-topic Web crawling and classification, combination formed of multiple strong classification algorithms. online according to the current topic, and classifies Web pages single classification algorithm is not strong when focused the paper proposed a strategy of using multi-classifier The focused crawler evaluates and ranks the classifiers by selecting the better classifiers. Through classification experiments of multiple topics crawling tasks, comparing between accurate rate of each classification algorithm and average classification accurate rate of multi-classifier combination, and comprehensive analysis of the two indicators classification accuracy and classification efficiency, it proved the proposed method is better in universality, to a certain extent and overcomes the limitations of a single classifier.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.252.84