检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐林川 邓思宇 吴彦学 温柳英[1] TANG Linchuan;DENG Siyu;WU Yanxue;WEN Liuying(School of Computer Science,Southwest Petroleum University,Chengdu 610500,China)
机构地区:[1]西南石油大学计算机科学学院
出 处:《计算机应用》2019年第9期2789-2794,共6页journal of Computer Applications
基 金:浙江省海洋大数据挖掘与应用重点实验室开放课题项目(OBDMA201601)~~
摘 要:数据库中大量重复图片的存在不仅影响学习器性能,而且耗费大量存储空间。针对海量图片去重,提出一种基于pHash分块局部探测的海量图像查重算法。首先,生成所有图片的pHash值;其次,将pHash值划分成若干等长的部分,若两张图片的某一个pHash部分的值一致,则这两张图片可能是重复的;最后,探讨了图片重复的传递性问题,针对传递和非传递两种情况分别进行了算法实现。实验结果表明,所提算法在处理海量图片时具有非常高的效率,在设定相似度阈值为13的条件下,传递性算法对近30万张图片的查重仅需2 min,准确率达到了53%。The large number of duplicate images in the database not only affects the performance of the learner,but also consumes a lot of storage space.For massive image deduplication,a duplicate detection algorithm for massive images was proposed based on pHash(perception Hashing).Firstly,the pHash values of all images were generated.Secondly,the pHash values were divided into several parts with the same length.If the values of one of the pHash parts of the two images were equal to each other,the two images might be duplicate.Finally,the transitivity of image duplicate was discussed,and corresponding algorithms for transitivity case and non-transitivity case were proposed.Experimental results show that the proposed algorithms are effective in processing massive images.When the similarity threshold is 13,detecting the duplicate of nearly 300 000 images by the proposed transitive algorithm only takes about two minutes with the accuracy around 53%.
关 键 词:重复图片检测 海量数据 感知Hash 局部探测 传递性
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79