分布式环境下大规模资源描述框架数据划分方法综述  被引量:6

Survey of large-scale resource description framework data partitioning methods in distributed environment

在线阅读下载全文

作  者:杨程 陆佳民[1] 冯钧[1] YANG Cheng;LU Jiamin;FENG Jun(College of Computer and Information,Hohai University,Nanjing Jiangsu 211100,China)

机构地区:[1]河海大学计算机与信息学院,南京211100

出  处:《计算机应用》2020年第11期3184-3191,共8页journal of Computer Applications

基  金:国家重点研发计划项目(2017YFC0405806,2018YFC0407901)。

摘  要:随着知识图谱的日益发展和在各个垂直领域的广泛应用,对于资源描述框架(RDF)数据的高效处理需求日益成为现代大数据管理领域中的新课题。RDF是W3C提出的用于描述知识图谱实体以及实体间关系的数据模型。为了有效地应对大规模RDF数据的存储和查询,很多学者考虑在分布式环境中管理RDF数据。RDF数据的分布式存储所面临的关键问题是数据的划分,而划分的结果很大程度上决定了SPARQL的查询性能。从数据划分的角度,主要围绕两类:基于图结构的RDF数据划分方法和基于语义的RDF数据划分方法展开深入阐述。前者包括多粒度层次划分、模板划分和聚类划分,适用于通用领域查询的语义范畴较为宽泛的场景;后者包括哈希划分、垂直划分和模式划分,更加适用于垂直领域查询的语义范畴相对固定的环境。此外,针对几种典型的划分方法进行对比与分析,为未来RDF数据划分方法的研究提供参考。最后,对未来RDF数据划分方法的发展方向进行了归纳总结。With the rapid development of knowledge graph and its wide usage in various vertical domains,the requirements for efficient processing of Resource Description Framework(RDF)data has increasingly become a new topic in the field of modern big data management.RDF is a data model proposed by W3C to describe knowledge graph entities and inter-entity relationships.In order to effectively cope with the storage and query of the large-scale RDF data,many scholars consider managing RDF data in a distributed environment.The key problem faced by the distributed storage of RDF data is data partitioning,and the performance of Simple Protocol and RDF Query Language(SPARQL)queries is largely determined by the results of partitioning.From the perspective of data partitioning,two types:graph structure-based RDF data partitioning methods and semantics-based RDF data partitioning methods,were mainly focused on and described in depth.The former include multi-granularity hierarchical partitioning,template partitioning and clustering partitioning,and are suitable for the wide semantic categories scenes of general domain query,while the latter include hash partitioning,vertical partitioning and pattern partitioning,and are more suitable for the environments of the relatively fixed semantic categories of vertical domain query.In addition,several typical partitioning methods were compared and analyzed to provide enlightenment for the future research on RDF data partitioning methods.Finally,the future research directions of RDF data partitioning methods were summarized.

关 键 词:资源描述框架 数据划分 分布式RDF数据存储 SPARQL查询 分布式数据库 

分 类 号:TP311.133.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象