检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谢蓉蓉[1] 徐慧[2] 郑帅位 马刚[1] XIE Rong-rong;XU-Hui;ZHENG-Shuai-wei;MA-Gang(School of Computer Science,Xi'an Shiyou University,Xi'an Shanxi 710065,China;School of Shiyou Engineering,Xi’an Shiyou University,Xi’an Shanxi 710065,China;Information Centre,Xi’an Shiyou University,Xi’an,Shanxi 710065,China)
机构地区:[1]西安石油大学计算机学院,陕西西安710065 [2]西安石油大学石油工程学院,陕西西安710065 [3]西安石油大学信息中心,陕西西安710065
出 处:《计算机仿真》2021年第6期439-443,共5页Computer Simulation
摘 要:为了提高网页大数据抓取效率,解决传统抓取方法误差大的问题,提出了基于网络爬虫的网页大数据抓取方法。首先分析网络爬虫运行的基本流程,按流程提取大数据关键特征,然后根据特征提取结果提出基于网络爬虫的数据抓取策略。经计算得到数据关键特征,从而选择广度优先策略抓取数据信息,并利用相重新构建相空间的方式得到爬虫维度,引入关联维数值完成网页大数据抓取,对数据关键特征完成抓取任务。通过仿真结果表明,所提方法对网页大数据的抓取率更好、耗时更短,与其它方法相比具有较高的鲁棒性。In order to improve the efficiency of web big data crawling and reduce large error in traditional methods, this paper puts forward a web big data crawling method based on web crawler. Firstly, the basic running process of network crawler was analyzed, and the key features of big data were extracted. According to the results of feature extraction, the data crawling strategy based on network crawler was proposed. After calculating the key features of the data, the breadth-first strategy was selected to obtain the data information. Meanwhile, the crawler dimension was obtained by reconstructing the phase space. Finally, the correlation dimension value was introduced to complete the crawling of web big data and key features of data. Simulation results show that the proposed method has better big data fetching rate, shorter time consumption and higher robustness than other methods.
分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229