检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:崔家奇 闫威 Cui Jiaqi;Yan Wei(College of Information,Liaoning University,Shenyang 110036,Liaoning,China)
出 处:《计算机应用与软件》2020年第12期26-31,45,共7页Computer Applications and Software
摘 要:海量RDF很难在单台机器上进行管理和查询RDF数据。针对该问题,提出一种基于Spark的SPARQL查询方法SSQ,将SPARQL查询转化为Spark分布式平台上的RDD操作。将数据图及查询图进行有效划分,增加并行度且减少分区间通信开销。通过谓词索引减小搜索空间,并优化连接,减少匹配次数,提高查询效率。在Spark集群上实现算法,在合成数据集LUBM上进行测试并与现有方法进行比较。结果表明该算法能够快速执行复杂SPARQL查询,并具有良好的可扩展性。Massive RDF is difficult to manage and query RDF data on a single machine.To solve this problem,we propose a SPARQL query algorithm SSQ based on Spark.It relied on a translation of SPARQL queries into executable Spark RDD code.The data graph and query graph were divided effectively to increase the parallelism and reduce the communication overhead between partitions.The predicate index was introduced to reduce search space and optimize join,thus reducing the number of connection and improving query efficiency.The algorithm is implemented on Spark cluster,and has been tested on LUBM dataset.The experimental results show that the algorithm can quickly execute complex SPARQL queries and has good scalability.
关 键 词:RDF数据 SPARQL查询 SPARK分布式平台 平衡语义划分 通信开销
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7