检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张媛 张慧钧 ZHANG Yuan;ZHANG Hui-jun(School of Modern Manufacturing Engineering,Heilongjiang University of Technology,Jixi Heilongjiang 158100,China;College of modern Manufacturing Engineering,Yan'an University,Yanan Shannxi 716000,China)
机构地区:[1]黑龙江工业学院现代制造工程学院,黑龙江鸡西158100 [2]延安大学数学与计算机科学学院,陕西延安716000
出 处:《计算机仿真》2023年第4期402-406,共5页Computer Simulation
基 金:黑龙江省自然科学基金资助项目(LH2022A023)。
摘 要:网络环境中海量数据具有明显复杂度,存在着大量结构化、半结构化和非结构化的数据,数据块长度与位置易产生较高相似性。当前已有的相似性数据识别属于密集任务型方法,会占用大量的内存空间。为了进一步提高数据利用率,降低数据冗余度,提出基于有序聚类方程的数据相似性识别建模仿真的方法。利用小波技术和重复数据删除技术对网络数据降噪,通过预设数据集中心,完成网络数据特征向量的优化提取。基于此,从时间、空间双维度分析特征向量的相似度,以点云分类网络和有序聚类方程为基础,构建数据相似性识别模型。实验结果表明,利用研究方法识别数据相似性时,其归一化互信息值为0.12,说明上述方法的准确度较高,针对不同规模的待识别数据,研究方法可在0.6s之内完成全部数据相似性的识别。以上实验所得数据证明了该方法具有较高的应用准确率和效率。Massive data in the network environment has obvious complexity.There are many structured,semistructured and unstructured data.The length and location of data blocks are easy to produce high similarity.At present,the existing similarity data recognition is task intensive methods,which will occupy a lot of memory space.In order to further improve data utilization and reduce data redundancy,a simulation method of data similarity recognition based on ordered clustering equation was proposed.First,wavelet technology and data deduplication technology were used to reduce the noise of network data,and then network data feature vectors were optimized and extracted by presetting the data set center.On this basis,the similarity between feature vectors were analyzed from the dimension of time and space.Based on the point cloud classification network and ordered clustering equation,a model of identifying data similarity was constructed in the end.Following conclusions can be drawn from the experimental results.When the proposed method was adopted to identify data similarity,the normalized mutual information value is 0.12,indicating that the accuracy of method is high.For different sizes of data to be identified,the method can complete the identification of all data similarity within O.6s.These experimental data prove high application accuracy and efficiency of method.
关 键 词:小波技术 重复数据删除技术 特征向量相似度 点云分类网络 有序聚类方程
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.223.169.109