一种基于主存Δ-tree的高维数据KNN连接算法  被引量:7

A KNN-Join Algorithm Based on Δ-Tree for High-Dimensional Data

在线阅读下载全文

作  者:刘艳[1,2] 郝忠孝[1,3] 

机构地区:[1]哈尔滨理工大学计算机科学与技术学院,哈尔滨150080 [2]长春大学软件学院,长春130022 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001

出  处:《计算机研究与发展》2010年第7期1234-1243,共10页Journal of Computer Research and Development

基  金:黑龙江省自然科学基金项目(F200601)

摘  要:KNN连接作为数据挖掘的基元,可以用来大幅度提高相似搜索、数据分析和数据挖掘的速度.到目前为止,对KNN连接的研究主要在基于磁盘系统的背景下进行,假设数据库太大以至于不能装入主存.随着RAM越来越大,价格也越来越低廉,这种假设逐渐受到挑战.因此,有必要重新对基于主存的KNN连接进行研究.在高效主存索引的基础上,采用编码解码、自底向上、深度优先遍历和剪枝等技术提出了一种新的KNN连接算法Δ-tree-KNN-Join.该算法解决了KNN连接中确定搜索半径困难的问题,提高了连接效率.在真实数据和合成聚类数据上进行了实验,结果显示Δ-tree-KNN-Join是一种有效的主存KNN连接算法.KNN-Join is an important database primitive, which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. Up to now, the KNN-Join has been studied in the context of disk-based systems where it is assumed that the databases are too large to fit into the main memory. This assumption is increasingly being challenged as RAM gets cheaper and larger. Therefore, it is necessary to study the problem of KNN-Join in the main memory. △-tree is a novel multi-level index structure that can speed up the high-dimensional query in the main memory environment. In this paper, a new KNN-Join approach is proposed, which uses △-tree as the underlying index structure, exploits coding and decoding, bottom up, depth first search and pruning technique. The problem that it is hard to identify the distance from each point in R to its K nearest neighbors in S is solved and the Join efficiency is improved. The correctness verification and cost analysis of the above mentioned algorithm are presented. Extensive experiments on both real datasets and synthetic clustered datasets are conducted, and the results illustrate that △-tree-KNN-Join is an effieient KNN-Join method in main memory.

关 键 词:相似连接 KNN连接 高维空间 主存 数据挖掘 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象