基于分区层次图的海量高维数据学习索引构建方法  

Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph

在线阅读下载全文

作  者:华悦琳 周晓磊[2,3] 范强 王芳潇 严浩 HUA Yue-lin;ZHOU Xiao-lei;FAN Qiang;WANG Fang-xiao;YAN Hao(School of Computer Science,Nanjing University of Information Science&Technology,Nanjing 210044;The 63rd Research Institute,National University of Defense Technology,Nanjing 210007;Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]南京信息工程大学计算机学院、网络空间安全学院,江苏南京210044 [2]国防科技大学第六十三研究所,江苏南京210007 [3]国防科技大学大数据与决策实验室,湖南长沙410073

出  处:《计算机工程与科学》2024年第7期1193-1201,共9页Computer Engineering & Science

摘  要:学习索引是破解海量高维数据近似最近邻搜索问题的关键。然而,现有学习索引技术结果仅局限于单个分区中,且依赖于近邻图的构建。随着数据维度和规模的增长,索引难以对分区边界数据进行精确判断,并且构建时间复杂度增大,可扩展性难以保障。针对上述问题,提出了基于分区层次图的学习索引方法PBO-HNSW。该方法对分区边界数据进行重新分配,并行构建分布式图索引结构,从而有效应对近似最近邻搜索问题所面临的挑战。实验结果表明,该方法能够在百万级海量高维数据上实现毫秒级的索引构建。当召回率为0.93时,PBO-HNSW方法构建时间仅为基线方法的36.4%。Learning to index is the key to solving the problem of approximate nearest neighbor search in massive high-dimensional data.However,existing learning to index techniques are limited to individual partitions and rely on the construction of neighborhood graph.As the dimensionality and scale of data grow,indexing struggles to accurately judge boundary data of partitions,leading to increased construction time complexity and challenges in scalability.To address the above problems,a learn to index method based on partitioned hierarchical graphs,PBO-HNSW is proposed.The method redistributes partition boundary data and constructs distributed graph index structures in parallel.It effectively addresses the challenges faced by the approximate nearest neighbor search problem.Experimental results show that PBO-HNSW method is able to achieve millisecond index construction on millions of massive high-dimensional data.When the recall is 0.93,the construction time of the PBO-HNSW method is only 36.4%of baseline methods.

关 键 词:近似最近邻搜索 学习索引 层次可导航小世界图 分区学习 索引结构 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象