检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张军旗[1] 周向东[1] 王梅[1] 施伯乐[1]
机构地区:[1]复旦大学 计算机与信息技术系,上海200433
出 处:《软件学报》2008年第6期1401-1412,共12页Journal of Software
基 金:Supported by the National Natural Science Foundation of China under Grant No.60403018;60773077(国家自然科学基金);the National Basic Research Program of China under Grant No.2005CB321905(国家重点基础研究发展计划(973));the Postdoctoral Science Foundation Funded Project of China under Grant No.20070420257(中国博士后科学基金);the Natural Science Foundation of Shanghai of China under Grant No.04ZR14011(上海市自然科学基金);Collaboration Plan of AMD with Universities(AMD大学合作计划)
摘 要:为了提高索引性能,高维度量空间索引通常采用K-Means等聚类技术来获取数据的分布信息.但是,已知的工作需要根据经验来确定聚类参数,缺乏对聚类与查询性能之间关系的理论分析.提出了一种基于聚类分解的高维度量空间B^+-tree索引,通过聚类分解,对数据进行更细致的划分来减少查询的数据访问.对聚类与查询代价的关系进行了讨论,通过查询代价模型,给出了最小查询代价条件下的聚类分解数目等理论的计算方法.实验显示,提出的索引方法明显优于iDistance等度量空间索引,最优聚类分解数的估计接近实际最优查询时所需的聚类参数.In order to improve the query efficiency, K-means cluster approach is often used to estimate the data distribution in the context of high dimensional metric space index. But in previous work, the parameters of clustering are usually selected according to some heuristic manner. This paper presents a new high dimensional index approach--cluster splitting based high dimensional B^+-tree. Through cluster splitting, the data space is partitioned more finely to reduce the cost of data access. The relationship between cluster and the query cost is discussed, and based on the query cost model, this paper give formulas to compute the "optimal" parameters of the cluster which can minimize the query cost in theory. Experiment results show that the efficiency of the methods is better than iDistance, M-Tree and sequence scan, and the parameters computed by the formulas are very close to the real optimal one.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15