检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]江苏科技大学计算机科学与工程学院,江苏镇江212003
出 处:《电子设计工程》2015年第6期30-32,共3页Electronic Design Engineering
基 金:镇江市社会发展项目(SH2013015)
摘 要:在信息化爆炸的时代,一般搜索引擎的搜索结果已经满足不了人们的需要,能获得更准确全面信息的垂直搜索引擎越来越受到关注。其中,主题爬虫作为垂直搜索引擎的核心部分一直是搜索方向的研究热点。本文以开源的网络爬虫Heritrix为基础,分析其结构特征与工作原理并引入了多线程处理的改进办法,设计出一个主题爬虫,在单机环境下进行该爬虫性能的测试。实验结果表明该主题爬虫的查全率达到较高水准,为进一步研究开发搜索效率高的垂直搜索引擎打下坚实的基础。In the era of information explosion, the general crawler cannot meet the requirements of personalized search in specific areas, but the topic crawler which can obtain more accurate and comprehensive information get more attention. Among them, the topic crawler as the core part of the vertical search engine has been the research focus in the search direction. On the basis of analyzing the structure and characteristics of the topic crawler Heritrix, this paper it designs a topic crawler by introducing its own improvement suggestions to multithreading and an experiment of the performance of the crawler has been carried out on PC. The results of this experiment proves that the ability, which lays a solid foundation for the development of vertical search engine based on topic crawler.
分 类 号:TN91[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117