基于对象代理的大数据共享可信数据湖平台  被引量:6

Data Sharing Trusted Data Lake Platform Based on Object Deputy

在线阅读下载全文

作  者:杨文哲 郝渊科 赵常胜 宋伟[1] 杨先娣[1] 彭智勇[1,2] YANG Wen-zhe;HAO Yuan-ke;ZHAO Chang-sheng;SONG Wei;YANG Xian-di;PENG Zhi-yong(School of Computer Science,Wuhan University,Wuhan 430000,China;Big Data Institute,Wuhan University,Wuhan 430000,China)

机构地区:[1]武汉大学计算机学院,武汉430000 [2]武汉大学大数据研究院,武汉430000

出  处:《小型微型计算机系统》2023年第6期1324-1328,共5页Journal of Chinese Computer Systems

基  金:国家重点研发计划项目(2020YFC1522602)资助;国家自然科学基金项目(U1811263,62072349)资助.

摘  要:随着数据型科学研究的快速发展及数据共享理念的推动,科学数据管理平台的建设得到了越来越多的重视.然而,由于科研数据量的增加和形式的多样性,传统的科学数据管理平台已不能满足用户对于数据组织服务的个性化需求.此外,数据湖作为一种新型的数据集中式存储库受到了工业界和学术界的广泛关注,它允许从多个数据源中摄入数据,并以原生格式进行存储.在数据湖架构的支撑下,本文主要基于对象代理数据库设计并实现了一种个性化的大数据共享可信数据湖平台.该平台支持多源异构原生数据的存储,实现了元数据管理、数据集检索等高效的数据存储和管理功能;基于对象代理数据模型,设计了合适的基本类和代理类,并结合对象代理数据库的更新迁移机制,实现了个性化数据空间管理及数据自动推送功能;在数据安全方面,利用数据去重技术进行重复数据删除,大大减少了存储消耗.With the rapid development of science data research and the promotion of the concept of data sharing,more and more attention has been paid to the construction of scientific data management platform.However,due to the increase of scientific research data and the diversity of forms,the traditional scientific data management platform has been unable to meet the user′s personalized needs for data organization services.In addition,data lakes have received a lot of attention from industry and academia as a new kind of centralized repository for data,which allows data to be ingested from multiple data sources and stored in native formats.Under the support of the data lake architecture,this paper mainly designs and implements a personalized data sharing platform based on the object deputy database.The platform supports the storage of multi-source and hetero-geneous native data,and realizes the efficient data storage and management functions such as metadata management and data set retrieval.Based on the object deputy data model,the appropriate source class and the deputy class are designed.Combined with the update and migration mechanism of the object deputy database,the function of personalized data space management and automatic dataset recommendation is realized.In the aspect of data security,data deduplication technology is used to delete data,which greatly reduces the storage consumption.

关 键 词:数据共享 数据湖 个性化数据空间管理 对象代理数据模型 数据去重 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象