自适应遗传算法在主题爬虫搜索策略中的应用研究  被引量:7

Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy

在线阅读下载全文

作  者:荆文鹏[1] 王育坚[1] 董伟伟[1] JING Wen-peng WANG Yu-jian DONG Wei-wei(College of Information Technology, Beijing Union University, Beijing 100101 ,China)

机构地区:[1]北京联合大学信息学院,北京100101

出  处:《计算机科学》2016年第8期254-257,共4页Computer Science

基  金:国家自然科学基金项目:基于超图形XGML的图像半结构化研究(61271369)资助

摘  要:如何提高爬虫覆盖率和准确率是主题爬虫的研究热点之一。目前大多采用最佳优先搜索策略,针对该类主题爬虫易陷入局部最优的不足,设计结合遗传算法的主题爬虫搜索策略,并设计动态适应度函数和遗传算子使得爬虫具有一定的自适应性。与其他搜索策略以及结合非自适应遗传算法的搜索策略进行了比较,结果表明该算法能够在一定程度上提高爬虫性能。How to design the crawler search strategy to improve the crawler's coverage and accuracy has become a hot research point in the focused crawler. Mostly crawler uses best-first search algorithm. Based on the focused crawler which uses this search strategy will easily plunge into local optimum, we combined genetic algorithm with focused crawler search strategy. We set dynamic fitness function and genetic-operators to make the crawlers have certain adap- tive searching adaptability. By comparing with those crawlers which use the other search strategy or which combine with traditional genetic algorithm search strategy, the experimental results show that this algorithm can partly improve the crawler search ability.

关 键 词:主题爬虫 重要度 遗传算法 遗传算子 适应度函数 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象