量化编码的分层可通航小世界图算法  被引量:1

Hierarchical navigable small world graph algorithms based on quantization coding

在线阅读下载全文

作  者:李秋珍[1] 白兴强 李立夏[1] 王赢[1] LI Qiu-zhen;BAI Xing-qiang;LI Li-xia;WANG Ying(Wuhan Digital Engineering Institute,Wuhan 430074;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区:[1]武汉数字工程研究所,湖北武汉430074 [2]华中科技大学计算机科学与技术学院,湖北武汉430074

出  处:《计算机工程与科学》2019年第4期618-625,共8页Computer Engineering & Science

基  金:军委装备发展部科研订购局"十三五"装备预研领域基金(61401320501)

摘  要:随着大数据和人工智能的高速发展,针对多媒体数据的结构化处理与基于内容的检索受到极大的关注,面对多媒体数据结构化后的海量高维特征向量,如何快速、准确地检索是人工智能处理大规模数据所必须解决的问题。最近提出的分层可通航小世界图HNSW检索算法在多个公开数据集取得了最佳的性能表现,但该算法存在内存开销大的问题。而基于量化编码的检索算法能够压缩数据集向量,大幅度降低内存占用。将量化编码和分层可通航小世界图算法结合,提出了2种基于量化编码改进的HNSW算法,分别是使用标量量化编码向量的HNSWSQ算法和使用乘积量化编码向量的HNSWPQ算法,2种算法使用不同的量化策略存储原始向量编码,以降低内存开销,再通过HNSW算法建立索引达到缩短检索耗时的目的。其中HNSWSQ算法在多个数据集上获得了与HNSW算法相近的查全率和平均检索耗时,而内存开销大幅降低。实验结果表明,HNSWSQ算法在SIFT-1M和GIST-1M数据集上的内存开销比HNSW算法分别降低了45.1%和70.4%。With the rapid development of big data and artificial intelligence, structured processing and content-based retrieval for multimedia data have received attention. Facing the massive high-dimensional feature vectors after multimedia data structuralization, how to achieve fast and accurate search results is a problem that artificial intelligence must solve when dealing with large-scale data. The recently proposed hierarchical navigable small world (HNSW) graph retrieval algorithm achieves the best performance in a number of public data sets. However, the algorithm suffers large memory overhead. Retrieval algorithms based on quantization coding can greatly compress the vectors of data sets and greatly reduce memory usage. Combining quantization coding with the hierarchical navigable small-world graph algorithm, we propose two improved hierarchical navigable small world graph algorithms based on quantization coding, namely HNSWSQ algorithm using scalar quantization-encoding vectors and HNSWPQ algorithm using product-quantized coding vectors. The two algorithms adopt different quantization strategies to encode and store the original vectors to reduce memory overhead, and then the HNSW algorithm is used to create an index to reduce the time-consumption of the retrieval. The HNSWSQ algorithm has similar recall rate and average retrieval time to those of the HNSW algorithm on multiple data sets, and the memory overhead is greatly reduced. Experimental results show that compared with the HNSW algorithm, the memory overhead of the HNSWSQ algorithm on the SIFT-1M and GIST-1M data sets is reduced by 45.1% and 70.4%, respectively.

关 键 词:近似最近邻检索 分层可通航小世界图算法 乘积量化 标量量化 相似性搜索 高维数据索引 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象