检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:常志军 钱力[1,2] 谢靖 吴振新[1,2] 张鹄 于倩倩[1] 王颖 王永吉 Chang Zhijun;Qian Liu;Xie Jing;Wu Zhenxin;Zhang Hu;Yu Qianqian;Wang Ying;Wang Yongji(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Library Information and Archives Management,University of Chinese Academy of Sciences,Beijing 100190,China;Institute of Software,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]中国科学院文献情报中心,北京100190 [2]中国科学院大学经济与管理学院图书情报与档案管理系,北京100190 [3]中国科学院软件研究所,北京100190
出 处:《数据分析与知识发现》2021年第3期69-77,共9页Data Analysis and Knowledge Discovery
摘 要:【目的】解决海量篇级文献的存储与在线访问、大规模数据治理和服务性能低的问题,建设科技文献大数据平台。【方法】以分布式技术为基础,分析科技大数据特点及服务导向,结合服务器、网络等硬件资源条件,采用共租部署策略,设计了"5+2"整体架构的科技文献大数据平台。【结果】建成PB级科技文献大数据平台,数据存储量达到200TB,文献实体量达3.2亿条,实体关系量达60亿条,基于MapReduce的元数据处理性能提高3倍,形成了基于微服务的知识服务架构。【局限】该平台未设计完整的流式处理流程,不能满足增量数据即时响应的需求。【结论】本文建设的科技文献大数据平台已支撑中国科学院文献情报中心知识发现平台、慧科研等产品体系,取得较好的线上服务效果,提升了对科技文献数据的处理计算与服务能力。[Objective] This research addresses the issues facing the storage and online access of massive textlevel documents, the governance of large-scale data, and the low service performance, aiming to build a big data platform for sci-tech literature. [Methods] First, we analyzed the characteristics of distributed big data services for science and technology. Then, we adopted a co-tenant deployment strategy based on the servers and networks. Finally, we designed a big data platform for sci-tech literature with a"5+2"overall architecture.[Results] We established a PB-level big data platform for sci-tech literature. It has data storage capacity of 200 TB and collected 320 million document entities as well as 6 billion entity relationship. The metadata processing performance based on MapReduce was increased by 3 times, and then formed the knowledge service architecture based on new technology. [Limitations] We did not adequately process streaming data, thus the system cannot offer prompt response for new data. [Conclusions] The new platform supports the knowledge discovery services of National Science Library, Chinese Academy of Sciences, as well as the intelligent scientific research system. It has good online services and improves the processing and service capabilities of sci-tech literature.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.26