BC-iDistance:基于位码的优化高维索引  被引量:3

BC-iDistance:Bit-code Based Optimal High-dimensional Index

在线阅读下载全文

作  者:梁俊杰[1] 冯玉才[1] 

机构地区:[1]华中科技大学计算机学院

出  处:《小型微型计算机系统》2007年第9期1647-1651,共5页Journal of Chinese Computer Systems

基  金:国家"八六三"基金项目([2005]555)资助

摘  要:在高维空间KNN查询算法中,近似向量和一维转换表示法能有效克服维数灾难,本文结合这两种思想,提出一种基于位码的优化高维索引结构(BC-iDistance).针对iDistance缺点,高维向一维转换引起的大量数据信息丢失,BC-iDistance不仅利用一维距离表示点对象和参考点间的远近关系,而且引入位码近似表示它们之间的位置关系,将高维向量压缩为二维向量表示.利用特殊的B+树组织,KNN检索时实现两层剪枝处理,降低I/O和距离计算代价.采用模拟数据和真实数据,实验验证了优化后的索引具有更高的检索效率.In the recent literature, a variety of index structures have been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one dimensional transformation can efficiently break the curse of dimensionality. Based on the two techniques above, an optimal index is proposed, called Bit-Code based iDistance (BC- iDistance). To overcome the drawback of the iDistance that one-dimensional transformation can incur much information loss, the BC-iDistance utilizes a novel representation that compactly represent a d-dimension vector as a 2-dimension vector: The first component is a distance that reflects the similarity of the d-dimension data point with respect to the corresponding reference point and the second component is a bit code, with one bit per dimension, that indicates which side of the reference it lies. This representation enables two levels filtering; the first component prunes away points that do not share similar distance ranges, while the second component filters away points by the lower bound distance based on their bit codes. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. The results of our experiments, using both synthetic and real data, demonstrate that the BC-iDistance outperforms the iDistance for KNN search in high-dimensional spaces.

关 键 词:高维索引 KNN查询位码 近似向量 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象