检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京信息工程大学计算机与软件学院,江苏南京210044 [2]南京信息工程大学江苏省网络监控中心,江苏南京210044
出 处:《计算机工程与设计》2016年第2期313-318,362,共7页Computer Engineering and Design
基 金:国家自然科学基金项目(41430427);江苏省青蓝工程基金项目(2012)
摘 要:为提高抓取海量DHT节点上的网络资源效率,提出一种基于路由注入的DHT网络爬虫方法。结合Kademlia算法的特点,针对路由表各区间计算查询目标ID,获取已知节点保存的所有节点信息,提高遍历节点的速度;在与网络节点交互的过程中,生成适应已知节点路由表的爬虫节点ID,达到注入对方路由表的目的;实现持续地抓取对方节点的资源。实验结果表明,该方法既找到了路由表的最佳注入区间,提高了注入成功率,使得获取DHT网络资源的效率提高,并在Btbook网站中得到成功应用。To improve the efficiency of crawling DHT network resources,a DHT crawler based on routing table injection was proposed.Combining with the characteristics of Kademlia algorithm,target ID was calculated in the light of the routing table of each section,all the nodes saved by known node were obtained,and the speed of crawling nodes was improved.Crawler ID adapted to the routing table of known table was generated during the process of interaction with the network node to achieve the purpose of routing table injection.The network source was crawled continuously from other nodes.Experimental results indicate that the proposed method not only figures out the best injection interval of the routing table,but improves the successful injection rate to gain the efficiency of crawling DHT network resources,and it has been successfully applied to the website named Btbook.
关 键 词:DHT网络 路由注入 网络爬虫 Kademlia算法 Btbook
分 类 号:TP393.0[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.10.218