检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:侯开茂 韩庆敏 吴云峰 黄兵 张久发 柴处处 Hou Kaimao;Han Qingmin;Wu Yunfeng;Huang Bing;Zhang Jiufa;Chai Chuchu(The 6th Research Institute of China Electronics Corporation,Beijing 100083,China)
机构地区:[1]中国电子信息产业集团有限公司第六研究所,北京100083
出 处:《信息技术与网络安全》2022年第4期71-76,共6页Information Technology and Network Security
摘 要:随着数字科学技术的发展,各领域需要传输和存储的数据量急剧上升。然而传输和存储的数据中重复数量占据了很大的比例,这不仅会增加使用数据的成本,也会影响处理数据的效率。域名是一种存储量大而且对处理速率有极高要求的数据,为了节约域名解析系统的存储成本,提高传输效率,本文在原有数据去重技术的基础上,引入了Simhash算法,结合域名数据的结构特征,改进数据分词和指纹值计算方式,提出了一种基于Simhash算法的重复域名数据去重方法。实验结果表明,相比于传统的数据去重技术,该方法对删除重复域名数据效率更高,具有较好的实际应用价值。With the development of digital science and technology,the amount of data that needs to be transmitted and stored in various fields has risen sharply.However,the number of repetitions in these data occupies a large proportion.This not only increases the cost of using data,but also reduces the efficiency of data processing.Domain name is a kind of data with large storage capacity and extremely high requirements for processing speed.In order to save storage cost and improve transmission efficiency,this paper proposes a method for deleting duplicate domain name data based on Simhash algorithm.Compared with the traditional data deduplication technology,this method combines the structural characteristics of the domain name data,and introduces the Simhash algorithm to design a deduplication method for the domain name data.The experimental results show that compared with the traditional data deduplication technology,this method is more efficient in deleting duplicate domain name data and has better practical application value.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.68.176