科研人员Web数据自动抓取模式及其开源解决方案被引量：9

The Mode of Automatically Crawling Web Data and its Open Source Solutions for Researchers

机构地区：[1]华中师范大学信息管理学院,武汉430079 [2]华中师范大学青少年网络心理与行为教育部重点实验室,武汉430079

出　　处：《信息资源管理学报》2015年第2期21-27,共7页Journal of Information Resources Management

基　　金：国家自然科学基金项目"基于用户偏好感知的SaaS服务选择优化研究"(71271099);湖北省自然科学基金创新群体重点项目"基于云计算的知识集成与服务研究"(2011CDA116)的成果之一

摘　　要：大数据时代的科研竞争是数据之争,高质量数据的获取往往决定着研究结论的优劣乃至项目的成败。然而对于科研人员的Web数据自动抓取问题,学界目前尚未有系统性研究成果出现。本文对数据抓取的基本模式进行分析,归纳出四类科研人员Web数据抓取的基本模式:单站静态抓取模式、跨站静态抓取模式、单站动态抓取模式及跨站动态抓取模式及其技术难点。本文同时也提出了科研人员Web数据自动抓取技术的两种开源解决方案:基于开源爬虫和自行定制爬虫,最后详细探讨了各方案的软件架构并给出了基本代码框架。In Big Data era, the quantity and quality of data which usually determines the quality of re- search findings as well as the whole project＇s success is becoming the key factor in scientific competition. However, taking the issue of automatically crawling web data into consideration, there is not yet a systemat- ic academic research. To address this issue, this paper carries out an analysis of the basic patterns that web crawling emerges and presents four basic web crawling modes of researchers： single site static crawl mode, cross-site static crawl mode, single site dynamic crawl mode and cross-site dynamic crawl mode. In the meantime, this paper introduces two kinds of method to solve the problem based on the architecture of open source： the open-source crawlers and researchers＇ own custom reptile. Finally, this paper gives a detailed discussion of the software architecture and the basic code of each solution.

关键词：科研人员 Web数据抓取技术方案开源软件

分类号：TP311.5[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

科研人员Web数据自动抓取模式及其开源解决方案被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

科研人员Web数据自动抓取模式及其开源解决方案 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

科研人员Web数据自动抓取模式及其开源解决方案被引量：9