基于pHash分块局部探测的海量图像查重算法  被引量:4

Duplicate detection algorithm for massive images based on pHash block detection

在线阅读下载全文

作  者:唐林川 邓思宇 吴彦学 温柳英[1] TANG Linchuan;DENG Siyu;WU Yanxue;WEN Liuying(School of Computer Science,Southwest Petroleum University,Chengdu 610500,China)

机构地区:[1]西南石油大学计算机科学学院

出  处:《计算机应用》2019年第9期2789-2794,共6页journal of Computer Applications

基  金:浙江省海洋大数据挖掘与应用重点实验室开放课题项目(OBDMA201601)~~

摘  要:数据库中大量重复图片的存在不仅影响学习器性能,而且耗费大量存储空间。针对海量图片去重,提出一种基于pHash分块局部探测的海量图像查重算法。首先,生成所有图片的pHash值;其次,将pHash值划分成若干等长的部分,若两张图片的某一个pHash部分的值一致,则这两张图片可能是重复的;最后,探讨了图片重复的传递性问题,针对传递和非传递两种情况分别进行了算法实现。实验结果表明,所提算法在处理海量图片时具有非常高的效率,在设定相似度阈值为13的条件下,传递性算法对近30万张图片的查重仅需2 min,准确率达到了53%。The large number of duplicate images in the database not only affects the performance of the learner,but also consumes a lot of storage space.For massive image deduplication,a duplicate detection algorithm for massive images was proposed based on pHash(perception Hashing).Firstly,the pHash values of all images were generated.Secondly,the pHash values were divided into several parts with the same length.If the values of one of the pHash parts of the two images were equal to each other,the two images might be duplicate.Finally,the transitivity of image duplicate was discussed,and corresponding algorithms for transitivity case and non-transitivity case were proposed.Experimental results show that the proposed algorithms are effective in processing massive images.When the similarity threshold is 13,detecting the duplicate of nearly 300 000 images by the proposed transitive algorithm only takes about two minutes with the accuracy around 53%.

关 键 词:重复图片检测 海量数据 感知Hash 局部探测 传递性 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象