基于k-means特征的适应性近似最近邻搜索算法  

Adaptive approximate nearest neighbor search algorithm based on k-means features

在线阅读下载全文

作  者:胡文洁 杨凯祥 谭宗元 HU Wenjie;YANG Kaixiang;TAN Zongyuan(School of Computer Science and Technology,Donghua University,Shanghai 201620,China)

机构地区:[1]东华大学计算机科学与技术学院,上海201620

出  处:《智能计算机与应用》2023年第9期80-84,共5页Intelligent Computer and Applications

摘  要:在基于倒排索引和HNSW索引结构的最近邻搜索算法中,由于所有查询点使用固定的终止条件进行近似最近邻搜索,从而导致某些查询点在搜索路径上访问了不必要的数据点。因此,本文针对十亿规模数据集,在IVF-HNSW算法的基础上,根据数据点的k-means特征和真实最小访问点,建立神经网络回归模型。通过模型,动态预测每个查询点在HNSW索引中找到最近邻所需要搜索的质心个数,以及在IVF中需要搜索的倒排列表的个数,最终每个查询点能够通过适应性搜索,减少需要访问的数据库向量的个数,进而降低总体搜索所需要的查询时间。实验结果表明,优化后的自适应搜索算法与原始IVF-HNSW算法相比,在最高召回率下,平均查询时间最多可降低27%。In the nearest neighbor search algorithm based on the inverted index and HNSW index structure,because all query points use a fixed termination condition to search the nearest neighbor,some query points visit unnecessary data points on the search path.Therefore,for the billion-scale data set,based on the IVF-HNSW algorithm,this paper establishes a neural network regression model according to the k-means characteristics of the data points and the real minimum access point and dynamically predicts each query point through the model to find the nearest neighbor in the HNSW index.The number of centroids to search and the number of inverted lists to search in IVF.Finally,each query point can be adaptively searched to reduce the number of database vectors that need to be accessed,thereby reducing the query time required for the overall search.The experimental results show that the optimized adaptive search algorithm can reduce the average query time by up to 27%at the highest recall rate compared with the original IVF-HNSW algorithm.

关 键 词:最近邻搜索 倒排索引 HNSW索引 适应性搜索 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象