机构地区:[1]清华大学软件学院,北京100084 [2]北京信息科学与技术国家研究中心(清华大学),北京100084 [3]大数据系统软件国家工程研究中心,北京100084
出 处:《中国科学:信息科学》2024年第10期2343-2367,共25页Scientia Sinica(Informationis)
基 金:国家自然科学基金(批准号:62021002);国家重点研发计划(批准号:2021YFB3300500)资助项目。
摘 要:工业物联网数据管理与数字中国基础设施建设紧密相关,是支撑提取工业大数据价值的基础.由于工业物联网数据源于设备,工业物联网数据管理系统面临着数据量、数据到达速度、负载多样性等更严峻的大数据挑战.为了应对这些挑战,工业物联网数据管理系统必须进行负载均衡,以充分利用可扩展的计算资源、提升系统性能.现有的负载均衡方法未能充分利用工业物联网数据典型的时序特性,无法应对工业物联网数据管理的上述挑战.本文针对工业物联网数据的时序特性,以读写差异化均衡为约束,建模了负载均衡最优化问题,以匹配工业物联网数据的读写分离特性;提出了负载均衡方案TsLBOpt,集成了简化系统架构的非侵入式负载统计与估算方法,利用分片细分与自适应复制以扩充解空间的整数规划最优化求解方法,以及基于贪心策略最小化数据迁移代价的数据重分布方法.TsLBOpt在清华大学获日内瓦国际发明展金奖的开源时序数据管理系统IginX中进行了实现,并基于多容器构建的集群系统开展了大量实验,结果表明,本文提出的TsLBOpt相比常用的哈希方法、经典的启发式热数据迁移法、前沿工作DynaHash分别可提升系统整体性能至2倍、10倍、4倍以上,且可有效应用于资源异构、组件异构的异构集群系统中.Industrial Internet of Things(IIoTs)data management is closely related to the infrastructure of the Digital China construction and serves as the foundation for extracting value from industrial big data.As IIoT data are generated from devices,IIoT data management systems face tougher challenges of big data such as fast increasing volume,extreme data arrival velocity,and machine-human mixed workload variety.To tackle these challenges,IIoT data management systems balance loads to fully utilize scalable computing resources and improve system performance.Existing load balancing approaches have not fully leveraged the characteristics of IIoT time series data;and hence,they fail to address the aforementioned challenges in IIoT data management.In this paper,we model the load balancing optimization problem with the constraints of separately balancing read-write workloads to match the characteristics of the IIoT time series data and workloads.We propose a load balancing scheme called TsLBOpt,which integrates a non-intrusive load monitoring and estimation method to simplify the system architecture.We formulate the optimization problem of load balancing,which TsLBOpt solves using integer linear programming(ILP).To expand the feasible solution space of the ILP,we propose adaptive resharding and replication of data partitions.A greedy method of partition redistribution is proposed to minimize data migration costs to enforce the load balancing results.TsLBOpt is implemented in Tsinghua’s opensource time series data management system IginX,which won the Gold Award of Geneva International Invention Exhibition.Extensive experiments are conducted in a cluster system with multiple containers.Experimental results show that TsLBOpt proposed in this paper can improve the overall system performance by more than 2 times compared to the widely-used hashing method,by more than 10 times compared to the classic hot-data migration method based on heuristics,and by more than 4 times compared to the state-of-the-art DynaHash method.Furthermor
关 键 词:工业物联网 物联网数据管理 负载均衡 性能最优化 时序数据
分 类 号:TN929.5[电子电信—通信与信息系统] TP393[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...