基于综合价值的Web主题信息搜集策略研究  被引量:1

Research of Searching Strategy in Web Topic Crawler

在线阅读下载全文

作  者:张玲[1] 林亚平[1] 陈治平[1] 童调生[2] 

机构地区:[1]湖南大学计算机与通信学院,湖南长沙410082 [2]湖南大学电气与信息工程学院,湖南长沙410082

出  处:《系统仿真学报》2005年第2期323-326,共4页Journal of System Simulation

基  金:国家自然科学基金(60272051)

摘  要:启发式的Web主题信息搜集系统依据链接的重要性决定访问Web的顺序,因此如何评价链接价值是决定其搜索策略的关键。提出一种基于综合价值的搜索算法,它综合了立即价值和未来价值两种链接评价方法,并依据链接价值所反映的Web实际搜索情况对两种价值间的关系进行动态调整,使网络蜘蛛能更准确地预测页面的重要性。实验结果表明,新的算法具有较高的搜索效率。With the rapid growth of World-Wide Web, the topic-specific crawler must seek out pages relevant to pre-defined topics in more and more web pages. The major problem in crawling is to perform appropriate credit assignment to different linkages. There are two kinds of method usually used in evaluating linkages' credit. One is based upon the linkage's immediate reward, and another is based upon the linkage's future reward. However in different situations, they have respective limitation. A new crawling strategy is proposed, which combines these two rewards to evaluate linkages together. Moreover, we utilize the changes of rewards to speculate about how relevant the candidate page-set is to topics, based on which the crawler can dynamically adjust the relationship between these two rewards. Our experiments show that compared with some traditional algorithms, this algorithm has better performance.

关 键 词:网络蜘蛛 搜索策略 立即价值 未来价值 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象