基于最大增益的广域网冗余数据迭代消除仿真  

Simulation of Iterative Elimination of Redundant Data in Wide Area Networks Based on Maximum Gain

在线阅读下载全文

作  者:肖金桐 温晓楠[2] 李亚娟 XIAO Jin-tong;WEN Xiao-nan;LI Ya-juan(Hebei University of Water Resources,Cangzhou Hebei 061000,China;Hebei Agricultural University,Huanghua Hebei 061100,China;Department of Computer Science,Hebei University of Water Resources,Cangzhou Hebei 061000,China)

机构地区:[1]河北水利电力学院,河北沧州061000 [2]河北农业大学,河北黄骅061100 [3]河北水利电力学院计算机系,河北沧州061000

出  处:《计算机仿真》2024年第10期371-375,共5页Computer Simulation

摘  要:广域网中数据量通常非常庞大,且数据可能分布在不同的地理位置和网络节点上,这样加大了数据消除处理的复杂性,为此提出一种基于最大增益的广域网冗余数据迭代消除算法。通过构建分词词典,计算字符串长度,处理字符数据,结合时间、空间和属性等相关信息,计算缺失值与邻近数据的平均欧氏距离,获取二者之间的相似度,完成缺失值填补。利用Chi Merge算法判断数据的独立性,使用阈值与分类数相结合的方式合并邻域区间,删除异常值,减少错误数据对冗余数据消除的影响。建立决策树模型,根据最大增益值确立分类规则,实现数据分类,计算类间相似度,检测出冗余数据,设置消除器,按照时间顺序迭代消除冗余数据。实验结果表明,所提算法不仅能够提高数据缩减率,而且能够确保吞吐量满足系统要求。The amount of data in wide area networks is usually very large,and the data may be distributed in different geographical locations and network nodes,which increases the complexity of data elimination processing.Therefore,a wide area network redundant data iterative elimination algorithm based on maximum gain is proposed.This algorithm first constructed a segmentation dictionary,and then calculated string length.After processing the character data,we used relevant information such as time,space,and attributes,to calculate the average Euclidean distance between missing values and adjacent data,thus obtaining the similarity between the two and completing the filling of missing values.Moreover,we used Chi Merge algorithm to judge the independence of data,and thus to merge neighborhood intervals according to the combination of threshold and classification number.After that,we deleted the outliers,thus reducing the impact of incorrect data on redundant data elimination.In addition,we built a decision tree model,and established classification rules according to the maximum gain,thus realizing the data classification.Finally,we calculated the similarity between classes,and detected redundant data.After setting an eliminator,we eliminated redundant data iteratively in chronological order.The experimental results show that the proposed algorithm can not only improve the data reduction rate,but also ensure that the throughput meets the system requirements.

关 键 词:云环境 广域网 冗余数据 迭代消除 决策树 最大增益 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象