基于容忍因子的近似最近邻混合查询算法  被引量:1

Approximate nearest neighbor hybrid query algorithm based on tolerance factor

在线阅读下载全文

作  者:贺广福 薛源海[1] 陈翠婷[1,2] 俞晓明 刘欣然[1,3] 程学旗 HE Guangfu;XUE Yuanhai;CHEN Cuiting;YU Xiaoming;LIU Xinran;CHENG Xueqi(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 101408,China;Beijing University of Posts and Telecommunications,Beijing 100876,China)

机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院大学,北京101408 [3]北京邮电大学,北京100876

出  处:《大数据》2024年第1期17-34,共18页Big Data Research

基  金:国家自然科学基金项目(No.U21B2046)。

摘  要:近似最近邻搜索(ANNS)是计算机领域中一种重要的高效相似度搜索技术,可用于在大规模数据集中进行快速信息检索。随着人们对高精度信息检索的需求不断增长,同时使用结构化信息和非结构化信息进行混合查询的方式也得到了广泛应用。然而,基于近邻图的过滤贪心算法在混合查询时可能会因结构化约束条件的影响导致连通性降低,进而损害搜索精度。为此,提出了一种基于容忍因子的过滤贪心算法,通过容忍因子控制不满足结构化约束条件的顶点参与路由,在不改变索引结构的前提下维持原有近邻图的连通性,克服了结构化约束条件对检索精度的负面影响。实验结果证明,新算法可以在不同结构化约束强度下实现ANNS的高精度搜索,同时保持检索效率。该研究解决了基于近邻图的ANNS在混合查询场景中的问题,为大规模数据集的快速混合查询信息检索提供了一种有效的解决方案。Approximate nearest neighbor search(ANNS)is an important technique in the field of computer science for efficient similarity search,enabling fast information retrieval in large-scale datasets.With the increasing demand for high-precision information retrieval,there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying,which has broad application prospects.However,filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries,ultimately damaging search accuracy.This article proposes a tolerance factor based filtered greedy algorithm,which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor,maintaining the connectivity of the original nearest neighbor graph without changing the index structure,and overcoming the negative impact of structural constraints on retrieval accuracy.The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints,while maintaining retrieval efficiency.This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios,providing an effective solution for fast hybrid query information retrieval in large-scale datasets.

关 键 词:混合查询 向量检索 最近邻搜索 过滤搜索 

分 类 号:TP391.3[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象