基于SPARK的大规模RDF数据上的SPARQL查询算法  被引量:1

A SPARQL QUERY ALGORITHM IN MASSIVE RDF DATA BASED ON SPARK

在线阅读下载全文

作  者:崔家奇 闫威 Cui Jiaqi;Yan Wei(College of Information,Liaoning University,Shenyang 110036,Liaoning,China)

机构地区:[1]辽宁大学信息学院,辽宁沈阳110036

出  处:《计算机应用与软件》2020年第12期26-31,45,共7页Computer Applications and Software

摘  要:海量RDF很难在单台机器上进行管理和查询RDF数据。针对该问题,提出一种基于Spark的SPARQL查询方法SSQ,将SPARQL查询转化为Spark分布式平台上的RDD操作。将数据图及查询图进行有效划分,增加并行度且减少分区间通信开销。通过谓词索引减小搜索空间,并优化连接,减少匹配次数,提高查询效率。在Spark集群上实现算法,在合成数据集LUBM上进行测试并与现有方法进行比较。结果表明该算法能够快速执行复杂SPARQL查询,并具有良好的可扩展性。Massive RDF is difficult to manage and query RDF data on a single machine.To solve this problem,we propose a SPARQL query algorithm SSQ based on Spark.It relied on a translation of SPARQL queries into executable Spark RDD code.The data graph and query graph were divided effectively to increase the parallelism and reduce the communication overhead between partitions.The predicate index was introduced to reduce search space and optimize join,thus reducing the number of connection and improving query efficiency.The algorithm is implemented on Spark cluster,and has been tested on LUBM dataset.The experimental results show that the algorithm can quickly execute complex SPARQL queries and has good scalability.

关 键 词:RDF数据 SPARQL查询 SPARK分布式平台 平衡语义划分 通信开销 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象