基于乘积量化的近似最近邻算法  被引量:3

Approximate nearest neighbor search based on product quantization

在线阅读下载全文

作  者:陶津 王晓东[1] 姚宇[1] TAO Jin;WANG Xiaodong;YAO Yu(Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院成都计算机应用研究所,成都610041 [2]中国科学院大学,北京100049

出  处:《计算机应用》2018年第A02期128-131,共4页journal of Computer Applications

基  金:四川省科技厅重点研发项目(2017SZ0010);四川省科技支撑计划项目(2016JZ0035)

摘  要:多媒体数据平台难以应付海量数据高效索引和搜索数据的问题,提出了一种解决近似最近邻问题的乘积量化算法。首先,根据海量数据索引和搜索问题的特性,采用近似最近邻思想建立数学模型;然后通过将数据的高维特征分段进行单独k最近邻编码得到数据的压缩编码;其次,根据编码方式建立解码器使得压缩编码可以近似地还原成原始特征。最后利用非对称距离计算的方式,计算出原始向量与压缩编码的距离,根据该距离来判断数据之间的相似程度达到搜索的目的。理论分析表明,与传统的基于局部哈希敏感的数据搜索算法相比,采用非对称距离计算的乘积量化算法在同等时间和召回率的条件下,搜索速度提高了约1 000倍。In view that multimedia data platforms are difficult to index and search large amounts of data effectively,a product quantization algorithm was proposed to solve approximate nearest neighbor problem.Firstly,according to the characteristics of massive data indexing and searching problems,a mathematical model was established by using approximate nearest neighbor idea;then,the data was compressed and coded by performing k-nearest neighbors coding separately after segmenting the high-dimensional features of the data.Secondly,according to the coding method,a decoder was used to make compression coding be approximately restored to the original features.Finally,by using Asymmetric Distance Computation(ADC)method,the distance between the original vector and the compression coding was calculated,and the similarity between the data was judged according to the distance to achieve the purpose of searching.Theoretical analysis shows that compared with the traditional local hash-sensitive data search algorithm,the product quantization algorithm using asymmetric distance calculation improves the search speed by about 1 000 times under the same time and recall conditions.

关 键 词:乘积量化 机器学习 近似最近邻算法 聚类算法 非对称距离计算 倒排索引 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象