基于最小哈希的网络多路虚假数据清洗算法  被引量:1

Cleaning Algorithm of Network Multiple False Data Based on Minimum Hash

在线阅读下载全文

作  者:王影[1] 李柯景[2] WANG Ying;LI Ke-jing(College of Humanities&Information,Changchun University of Technology,Changchun Jilin 130122,China;College of Computer Science and Technology,Changchun University,Changchun Jilin 130022,China)

机构地区:[1]长春工业大学人文信息学院,吉林长春130122 [2]长春大学计算机科学技术学院,吉林长春130022

出  处:《计算机仿真》2023年第5期511-514,519,共5页Computer Simulation

摘  要:传统数据清洗方法未进行数据真实属性相似度衡量,存在网络多路虚假数据清洗效果不佳,于是提出最小哈希的网络多路虚假数据清洗算法。对网络多路数据进行整合并构建先验知识库,根据贝叶斯分类进行相关性模型特征归纳;基于编码所属类型实现后验概率编码分类,进行多路数据编码转换;将哈希等级较低的集合作为指纹信息,设置两个多路数据集合,使用最小哈希计算数据相似度;通过相似度衡量数据真实属性;构建前馈型神经网络数据清洗模型,推算网络模型样本训练偏差,同时初始化参变量与种群,运用轮盘赌方法获得匀称分布随机值,将各变量引入数据清洗模型反复执行选择、交叉、变异操作,实现高精度虚假数据清洗目标。仿真结果表明:与传统方法相比,所提方法具有更高的查全率,数据清洗效率也得到显著提升,为用户提供更加安全可靠的网络交流环境。Traditionally,the data cleaning method does not measure the similarity of real attributes of data,and the cleaning effect is not ideal.Therefore,this article puts forward a for network multi-channel false data cleaning based on minimum hash algorithm.Firstly,we integrated multi-channel network data and constructed a priori knowledge base.According to Bayesian classification,we summarized the features of the correlation model.Based on the coding type,we completed the classification for posterior probability codes and converted the multi-channel data codes.Moreover,we took the set with a lower hash level as fingerprint information,established two multi-channel data sets,and then used the minimum hash to calculate the data similarity.Secondly,we measured the real attributes of data by the similarity.Thirdly,we built a feed-forward neural network data-cleaning model and calculate the training deviation of network model samples.Meanwhile,we initialized the parameter variables and population.After that,we used the method of roulette to obtain uniformly distributed random values.Finally,we added all variables into the data cleaning model to perform selection,crossover and mutation operations,thus achieving the false data cleaning with high precision.Simulation results show that the proposed method has higher recall rate than that of the traditional method.in addition,the data cleaning efficiency is improved significantly.This method provides a safer network environment for users.

关 键 词:最小哈希 虚假数据 数据清洗 编码转换 遗传神经网络 

分 类 号:TP185[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象