检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾文钢[1] 高锦涛 JIA Wen-gang;GAO Jin-tao(College of Information Engineering,Inner Mongolia University of Technology,I Hohhot nner Mongolia 010051,China;Inner Mongolia Autonomous Region Special Equipment Inspection,Hohhot Inner Mongolia 010051,China)
机构地区:[1]内蒙古工业大学信息工程学院,内蒙古呼和浩特010051 [2]内蒙古特种设备检验院,内蒙古呼和浩特010051
出 处:《计算机仿真》2021年第12期241-244,249,共5页Computer Simulation
基 金:内蒙古工业大学科学研究项目(ZY201902)。
摘 要:利用当前算法滤除数据冗余点时,缺少对数据冗余点特征的提取、分类处理过程,导致滤除效率差、准确率低、存储开销过大。于是设计了基于HDFS的海量日志数据冗余点过滤算法。引入HDFS体系架构,利用数据采样时间序列获取数据冗余点的特征,并进行分类处理,提升冗余点的滤除效率;计算滤除前含有冗余特征的数据字节数与普通字节数之比的缩减率、误判率,减少存储开销量;为提高准确率、消除性能,采用相似度概念,根据冗余点的突出特征计算整体相似度,再通过均值漂移传递函数实现对数据冗余点的滤除。实验结果表明:上述算法滤出效率更好、准确率更高、存储开销量更小。Currently, the lack of feature extraction and classification of redundant data points leads to poor filtering efficiency, low accuracy and large storage overhead. In this regard, the redundant point filtering algorithm of massive log data based on HDFS was designed in this paper. Firstly, according to the HDFS architecture, the data sampling time series was introduced to obtain the characteristics of data redundant points. Concurrently, the characteristics were classified to improve the filtering efficiency of redundant points. Secondly, the reduction rate and misjudgment rate of the ratio of the number of data bytes with redundant characteristics to the number of ordinary bytes before filtering were calculated to reduce the storage volume. Then, the overall similarity was calculated according to the concept of similarity and the prominent characteristics of redundant points for improving the accuracy and eliminating the performance. Finally, based on the mean shift transfer function, the filtering of redundant data points was achieved. The experimental results show that the algorithm has high filtering efficiency, accuracy and low storage overhead.
关 键 词:数据冗余点 冗余特征 缩减率计算 均值漂移传递函数
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.19.244.133