基于信息网模型的动态数据划分策略  被引量:1

DYNAMIC DATA PARTITION BASED ON INFORMATION NETWORK MODEL

在线阅读下载全文

作  者:陈诗雅[1] 刘梦赤[1] Chen Shiya;Liu Mengchi(School of Computer,Wuhan University,Wuhan 430072,Hubei,China)

机构地区:[1]武汉大学计算机学院,湖北武汉430072

出  处:《计算机应用与软件》2018年第11期42-48,共7页Computer Applications and Software

基  金:国家自然科学基金面上项目(61672389);国家杰出青年科学基金(外籍)项目(60688201)

摘  要:为了满足大规模数据管理与查询的需要,设计并开发了基于信息网模型INM(Information Networking Model)的分布式并行数据库管理系统。分布式环境下数据的划分方式将影响系统的可扩展性和查询分析效率。根据信息网模型的数据结构和查询特性,设计一种轻量级的数据动态划分方法。该方法结合数据的水平分割和垂直分割,以INM对象为单位,未存储过的数据对象直接划分到当前操作节点,并记录数据对象的存储位置,否则根据数据对象的历史位置信息将其划分到不同的存储节点。同时,单个INM对象可能由于其包含的关联对象增多到一定程度成为大对象,而对系统的性能造成影响,因此将此类大对象分割成多个小对象,并按照一定的策略划分到不同节点进行存储。集群中的每个处理节点被赋予一个负载阈值。随着数据量的增加,如果超过负载阈值则增加新的机器,保证系统的可扩展性和各个处理节点数据量的均衡。实验结果证明,该方法能够保障系统良好的可扩展性,同时提高数据的查询分析效率。To meet the needs of large-scale data management and query,we designed and developed a distributed and parallel database management system based on information network model(INM).The way to divide data in distributed environment would have an effect on scalability and the efficiency of query analysis of the system.We proposed a lightweight dynamic data partition method according to the data structure and query characteristics of INM.The method combined the horizontal and vertical segmentation of data.The unstored data objects were directly divided into the current operation nodes,and the storage locations of the objects were recorded in units of INM objects.Otherwise,the data objects were divided into different storage nodes according to their historical location information.A single INM object might become a large object because its associated objects were increased to a certain degree,which would have an impact on the performance of the system.Therefore,these large objects were segmented into multiple small objects and divided into different nodes for storage according to certain strategies.We gave each processing node in the cluster a load threshold.As the amount of data increased,a new machine would be added if stored data exceeded the load threshold,so as to ensure the scalability of the system and the balance of the amount of data in each processing node.Experimental results show that this method can guarantee good scalability of system and improve the efficiency of query analysis of data as well.

关 键 词:信息网模型 数据划分 大对象分割 负载阈值 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象