基于多级索引的高维数据近似最近邻搜索  被引量:4

Approximate Nearest Neighbor Search Algorithm for High Dimensional Data Based on Multi Level Index

在线阅读下载全文

作  者:杨凤丽 李娜 刘仁芬 YANG Feng-li;LI Na;LIU Ren-fen(Sifang College,Shijiazhuang Tiedao University,Shijiazhang Hebei 051132,China)

机构地区:[1]石家庄铁道大学四方学院,河北石家庄051132

出  处:《计算机仿真》2022年第11期398-401,共4页Computer Simulation

摘  要:当前的高维数据最近邻搜索方法大多应用单级索引,导致近邻搜索稳定性较差,且时间开销较大。为此提出基于多级索引的高维数据近似最近邻搜索方法。利用二级距离敏感哈希算法(M2LSH)实现多级索引。将第一次哈希处理的高维数据输入哈希桶内,使用二次哈希映射桶号,使其在一维空间中呈现。依据各桶内数据量完成临近哈希桶合并,将新哈希桶作为候选搜索集合,实现高维数据近似最近邻搜索。实验结果表明:不同相邻桶距离下,所提算法优化后的近似比率均可保持在1左右,搜索效果大幅度提升,且稳定性较好;将该算法的哈希函数数量和哈希桶宽度分别设置为12、3,能获得更优异的搜索效果,并极大地节省时间开销,说明多级索引是处理高维数据近似最近邻问题的有效方法。The current nearest neighbor search method for high-dimensional data use single-level index,reslting in poor search stability and high time cost.Therefore,we reported an approximate nearest neighbor search method for high-dimensional data based on multi-level index.The two-level distance sensitive hash algorithm(M2 LSH)was introduced for completing the multi-level index.The high-dimensional data after the first hash processing were input into the hash bucket to use the secondary hash mapping bucket number to render it in one-dimensional space.Based on the amount of data in each bucket,adjacent hash buckets were merged.The new hash bucket was used as a candidate search set to realize the approximate nearest neighbor search of high-dimensional data.The experimental results show that the approximate ratio of the proposed algorithm after optimization can be maintained at about 1 under different adjacent bucket distances,and the search effect is greatly improved and the stability is good;Setting the number of hash functions and the width of the hash bucket of the algorithm to 12 and 3 respectively can achieve better search results and greatly save time.It shows that multi-level indexing is an effective method to deal with the problem of high-dimensional data approaching the nearest neighbor.

关 键 词:多级索引 高维数据 近似最近邻 距离敏感哈希 哈希桶 

分 类 号:TP311.3[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象