检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王影[1] 李柯景[2] WANG Ying;LI Ke-jing(College of Humanities&Information,Changchun University of Technology,Changchun Jilin 130122,China;College of Computer Science and Technology,Changchun University,Changchun Jilin 130022,China)
机构地区:[1]长春工业大学人文信息学院,吉林长春130122 [2]长春大学计算机科学技术学院,吉林长春130022
出 处:《计算机仿真》2023年第5期511-514,519,共5页Computer Simulation
摘 要:传统数据清洗方法未进行数据真实属性相似度衡量,存在网络多路虚假数据清洗效果不佳,于是提出最小哈希的网络多路虚假数据清洗算法。对网络多路数据进行整合并构建先验知识库,根据贝叶斯分类进行相关性模型特征归纳;基于编码所属类型实现后验概率编码分类,进行多路数据编码转换;将哈希等级较低的集合作为指纹信息,设置两个多路数据集合,使用最小哈希计算数据相似度;通过相似度衡量数据真实属性;构建前馈型神经网络数据清洗模型,推算网络模型样本训练偏差,同时初始化参变量与种群,运用轮盘赌方法获得匀称分布随机值,将各变量引入数据清洗模型反复执行选择、交叉、变异操作,实现高精度虚假数据清洗目标。仿真结果表明:与传统方法相比,所提方法具有更高的查全率,数据清洗效率也得到显著提升,为用户提供更加安全可靠的网络交流环境。Traditionally,the data cleaning method does not measure the similarity of real attributes of data,and the cleaning effect is not ideal.Therefore,this article puts forward a for network multi-channel false data cleaning based on minimum hash algorithm.Firstly,we integrated multi-channel network data and constructed a priori knowledge base.According to Bayesian classification,we summarized the features of the correlation model.Based on the coding type,we completed the classification for posterior probability codes and converted the multi-channel data codes.Moreover,we took the set with a lower hash level as fingerprint information,established two multi-channel data sets,and then used the minimum hash to calculate the data similarity.Secondly,we measured the real attributes of data by the similarity.Thirdly,we built a feed-forward neural network data-cleaning model and calculate the training deviation of network model samples.Meanwhile,we initialized the parameter variables and population.After that,we used the method of roulette to obtain uniformly distributed random values.Finally,we added all variables into the data cleaning model to perform selection,crossover and mutation operations,thus achieving the false data cleaning with high precision.Simulation results show that the proposed method has higher recall rate than that of the traditional method.in addition,the data cleaning efficiency is improved significantly.This method provides a safer network environment for users.
关 键 词:最小哈希 虚假数据 数据清洗 编码转换 遗传神经网络
分 类 号:TP185[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.19.234.118