基于聚类分解的高维度量空间索引B^+-Tree  被引量:23

Cluster Splitting Based High Dimensional Metric Space Index B^+-Tree

在线阅读下载全文

作  者:张军旗[1] 周向东[1] 王梅[1] 施伯乐[1] 

机构地区:[1]复旦大学 计算机与信息技术系,上海200433

出  处:《软件学报》2008年第6期1401-1412,共12页Journal of Software

基  金:Supported by the National Natural Science Foundation of China under Grant No.60403018;60773077(国家自然科学基金);the National Basic Research Program of China under Grant No.2005CB321905(国家重点基础研究发展计划(973));the Postdoctoral Science Foundation Funded Project of China under Grant No.20070420257(中国博士后科学基金);the Natural Science Foundation of Shanghai of China under Grant No.04ZR14011(上海市自然科学基金);Collaboration Plan of AMD with Universities(AMD大学合作计划)

摘  要:为了提高索引性能,高维度量空间索引通常采用K-Means等聚类技术来获取数据的分布信息.但是,已知的工作需要根据经验来确定聚类参数,缺乏对聚类与查询性能之间关系的理论分析.提出了一种基于聚类分解的高维度量空间B^+-tree索引,通过聚类分解,对数据进行更细致的划分来减少查询的数据访问.对聚类与查询代价的关系进行了讨论,通过查询代价模型,给出了最小查询代价条件下的聚类分解数目等理论的计算方法.实验显示,提出的索引方法明显优于iDistance等度量空间索引,最优聚类分解数的估计接近实际最优查询时所需的聚类参数.In order to improve the query efficiency, K-means cluster approach is often used to estimate the data distribution in the context of high dimensional metric space index. But in previous work, the parameters of clustering are usually selected according to some heuristic manner. This paper presents a new high dimensional index approach--cluster splitting based high dimensional B^+-tree. Through cluster splitting, the data space is partitioned more finely to reduce the cost of data access. The relationship between cluster and the query cost is discussed, and based on the query cost model, this paper give formulas to compute the "optimal" parameters of the cluster which can minimize the query cost in theory. Experiment results show that the efficiency of the methods is better than iDistance, M-Tree and sequence scan, and the parameters computed by the formulas are very close to the real optimal one.

关 键 词:高维空间 索引结构 查询代价模型 聚类分割 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象