检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:邹志文[1] 秦程 ZOU Zhiwen;QIN Cheng(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang Jiangsu 212013,China)
机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013
出 处:《计算机应用》2021年第3期733-737,共5页journal of Computer Applications
基 金:镇江市重点研发计划(产业前瞻与共性关键技术)项目(GY2017025)。
摘 要:现有的R-树空间聚类技术在通常通过随机指定或者计算空间数据间的欧氏距离来选取聚类中心,而未考虑空间数据间的主题相关度。这些导致聚类结果受初始k值影响,空间数据间的关联仅仅是基于地理位置的。针对此种情况,提出了一种基于k-means++的动态构建空间主题R树(TR-tree)方法。首先,在传统的k-means++算法上,通过聚类测度函数动态地确定k个聚类簇,并在聚类测度函数中引入潜在狄利克雷分布(LDA)模型来计算每个空间数据文本的主题概率,从而加强空间数据间的主题关联度;其次,通过主题概率选取概率最大的聚类中心;最后,构建TR-tree,并且在构建时动态分配空间数据。实验结果表明:虽然构建R-树的时间略有增加,但该方法在索引效率及节点间关联度上较仅仅基于地理位置聚类构建R-树的算法有明显提升。The existing R-tree spatial clustering technology usually randomly designates or calculates the Euclidean distance between spatial data to select the cluster centers,without considering the topic relevance between spatial data,so that the clustering result is influenced by the initial value of k,and the association between spatial data is only based on geographic location.Aiming at this situation,a method of dynamically constructing spatial Topic R-tree(TR-tree)based on k-means++was proposed.Firstly,in the traditional k-means++algorithm,k clusters were dynamically determined by the clustering measure function,and Latent Dirichlet Allocation(LDA)model was introduced into the clustering measure function to calculate the topic probability of each spatial data text,as a result,the topic relevance between spatial data was strengthened.Secondly,the cluster center with the highest probability was selected through the topic probabilities.Finally,the TR-tree was constructed,and the spatial data were dynamically allocated during the construction.Experimental results show that with a slight increase of the R-tree construction time,this method has the indexing efficiency and correlation between nodes significantly improved compared to the algorithm of constructing R-tree based only on geographic location clustering.
关 键 词:R-树 k-means++ 聚类 索引效率 潜在狄利克雷分布模型
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49