物联网大数据场景下的分布式哈希表适用条件分析  被引量:16

On Distributed Hash Table’s Applicability to Internet-of-Things Big Data Management

在线阅读下载全文

作  者:安彦哲 朱妤晴[2,3] 王建民 AN Yan-Zhe;ZHU Yu-Qing;WANG Jian-Min(School of Software,Tsinghua University,Beijing 100084;Beijing National Research Center of Information Science and Technology(Tsinghua University),Beijing 100084;National Engineering Laboratory of Big Data System Software,Beijing 100084)

机构地区:[1]清华大学软件学院,北京100084 [2]北京信息科学与技术国家研究中心(清华大学),北京100084 [3]大数据系统软件国家工程实验室,北京100084

出  处:《计算机学报》2021年第8期1679-1695,共17页Chinese Journal of Computers

基  金:国家自然科学基金重大项目课题(71690231)资助。

摘  要:针对“新基建”带来的物联网大数据管理真实应用场景中的挑战,本文对当前最优实践所用的大规模数据管理系统的核心——分布式哈希表(Distributed Hash Table,DHT),第一次基于极高写入负载和数据流量两个要素,进行了适用条件的理论推导分析.面向存储空间、带宽和时间三方面的限制关系,从理论上分析了写入负载和联网带宽对DHT负载再均衡条件的影响,并推导出DHT负载再均衡设计仅适用于一定规模的物联网数据管理场景,而不适用于大规模物联网数据管理的结论.利用了基于DHT的业界常用系统Cassandra的物联网数据负载实验以及系统级模拟器的大量仿真实验结果验证了理论推导结果的有效性.基于理论结果对真实案例进行了应用分析,表明本文的理论结果可用于分析解决当前基于DHT系统支撑物联网数据负载出现的问题,并可用于分析和指导物联网数据管理系统的设计.Targeting at the emerging application scenarios and the corresponding challenges of Internet of Things(IoT),this work presents a theoretical analysis on the load rebalancing conditions of distributed hash table(DHT),focusing on the unprecedentedly high workload of writes and the network bandwidth between nodes.While DHT is the state-of-the-practice system structure for large-scale data management,its design has not taken into account the workload characteristics of IoT applications.The typical workload characteristic is the unprecedented intensity of writes.With respect to write workloads and network bandwidth,this paper deduces the applicability conditions of DHT,considering the constraints on bandwidth,storage and time.For DHT-based IoT data management systems with load balancing,the theoretical results imply the following facts:(1)the maximum write throughput that a scalable IoT data management system can support is decided by the number n of nodes to scale to and by the network bandwidth of system nodes;(2)while increasing the number N of system nodes can increase the total storage capacity of the system,it cannot increase the maximum write throughput that the system can support;(3)scale-out processes with a large number n of nodes can lead to sudden and heavy decreases of the maximum write throughput at each system node,leading to disruptive workload redistribution;and,(4)scaling out by a small number n of nodes is a more economical process and complies with the Pay-as-You-Go design consideration of cloud,but still not addressing the problem of scalable IoT data management.Experiments on the widely-used DHT-based system Cassandra and extensive simulations based on standard network system simulator ns-3 validate the theoretical results.With real IoT data management use cases,it is demonstrated that the theoretical results of this work can be used to account for the problems met when exploiting DHT-based systems for IoT data storage,as well as guiding the design of IoT data management system.The results of thi

关 键 词:物联网数据管理 分布式哈希表 负载均衡 时序数据 时序数据库 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象