检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:江润本 陈家颖 毛睿 JIANG Runben;CHEN Jiaying;MAO Rui(College of Computer Science and Software Engineering,Guangdong Provincial Key Laboratory of Popular High Performance Computers,Guangdong Provincial Engineering Center of China-made High Performance Data Computing System,Shenzhen Key Laboratory of Service Computing and Applications,Shenzhen University,Shenzhen 518060,Guangdong Province,P.R.China)
机构地区:[1]深圳大学计算机与软件学院,广东省普及型高性能计算机重点实验室,广东省国产高性能数据计算系统工程技术研究中心,深圳市服务计算与应用重点实验室,广东深圳518060
出 处:《深圳大学学报(理工版)》2023年第6期640-648,共9页Journal of Shenzhen University(Science and Engineering)
基 金:National Natural Science Foundation of China (62072311)。
摘 要:现有的度量空间的近似最近邻搜索(approximate nearest neighbor search, ANNS)方法通常依赖于预选择的支撑点构成的序列,序列中的支撑点按照到数据元素的距离升序排列.然而,大多数现有的度量空间ANNS方法由于索引结构复杂、支撑点过多或者未能充分利用距离信息导致搜索时内存开销巨大.为此,提出精简排列阵(reduced permutation array, RPA)的度量空间recall@R近似最近邻搜索方法.对于全体数据元素,RPA预先选择k个支撑点,对每个数据元素仅存储离该数据元素最近的l个(l<<k),并将所有元素的支撑点序列构建为一个数组结构.在搜索过程中,利用一种得分函数,该函数基于查询对象到各个支撑点的距离来近似计算数据元素到查询对象的距离.同时,维护一个有界最小堆,以保存R个候选结果数据元素.RPA具有结构简单、内存效率高和可扩展性强等特点.实验结果表明,在相同召回率的情况下,与排列索引(permutation-based index, P-index)相比,RPA平均具有高达3倍的内存压缩比.研究结果可在内存资源有限的单机环境下提供一种有效的针对海量数据的ANNS方法.Approximate nearest neighbor search(ANNS)for high dimensional data has received extensive research efforts.Many existing ANNS methods in metric spaces are essentially based on the permutation of pivots,or pre-selected reference points,which are ordered by their distances to the data.Unfortunately,most of these permutation-based methods are quite memory demanding,because of either complexity of index structure,excessive pivots,or insufficient exploitation of distances.This paper proposes the reduced permutation array(RPA),a delicate integration of existing technical elements based on in-depth understanding,for recall@R ANNS in metric spaces.As an array,for each data element,RPA stores only l identities(IDs)of its nearest pivots out of k pre-selected ones(l≪k).During search,the order of distances from each data element to the query is approximated by a score function based on the distances from its nearest pivots to the query,and a bounded heap is maintained to store R candidate results.RPA is simple,memory-efficient,and scalable.Experiments show that RPA is approximately 3 times more memory-efficient than existing methods on average.The research results can provide an effective ANNS method for massive data in a single-machine environment with limited memory resources.
关 键 词:计算机科学与技术 近似最近邻搜索 度量空间 索引结构 支撑点选择 支撑点序列 内存高效
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7