FAIDA:一种快速精确的图像消冗方法  被引量:2

FAIDA: A Fast and Accurate Image Deduplication Approach

在线阅读下载全文

作  者:陈明[1] 王树鹏[2] 云晓春[1,3] 吴广君[2] 李超[3] 

机构地区:[1]北京邮电大学灾备技术国家工程实验室,北京100876 [2]中国科学院信息工程研究所,北京100093 [3]国家计算机网络应急技术处理协调中心,北京100029

出  处:《计算机研究与发展》2013年第1期101-110,共10页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61003260;61202067;61271275)

摘  要:重复数据删除能够有效地提高存储利用率,现已在备份、归档系统中得到良好应用.然而这种基于比特流的Hash匹配策略对很多应用来说过于严格,例如重复图像删除.为了解决该问题,提出了一种快速精确的图像消冗方法.该方法首先根据Web图像特点给出重复图像定义,然后将图像消冗分为两个阶段.在重复图像发现阶段利用感知Hash等多重过滤技术提高图像检索速度和精度,在重复图像消冗阶段利用模糊逻辑推理选取质心图像以实现消冗.实验结果表明,该方法不仅具有快速、精确的重复图像消冗能力,而且在质心图像的选择上也能满足用户的感知要求.Deduplication is an effective way to improve storage utilization by eliminating redundant copies of duplicate data and replacing them with logical pointers to the unique copy. At present, it has been widely used in backup and archive systems. However, most of the existing deduplication systems use hashing to compute and compare data chunks to determine whether they are redundant. The Hash-based exact match is too strict for many applications, for example image deduplication. To solve this problem, a fast and accurate image deduplication approach is presented. We firstly give the definition of duplicate images according to the characteristics of Web images, and then divide image deduplication into two stages: duplicate image detection and duplicate image deduplication. In the first stage, we use perceptual hashing to improve image retrieval speed and multiple filters to improve image retrieval accuracy. In the second stage, we use fuzzy logic reasoning to select the proper centroid-images from duplicate image sets by simulating the process of human thinking. Experimental results demonstrate that the proposed approach not only has a fast and accurate ability to detect duplicate images, but also meets users' perceptive requirements in the selection of centroid-images.

关 键 词:图像消冗 感知Hash 多重过滤 质心图像 模糊逻辑推理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象