改进的R-树的多维数据重复检测方法  

Multi dimension data duplicate detection method of improved R-tree

在线阅读下载全文

作  者:贺建英 HE Jianying(School of Intelligent Manufacturing,Sichuan University of Arts and Science,Dazhou 635000,China)

机构地区:[1]四川文理学院智能制造学院,四川达州635000

出  处:《电子设计工程》2023年第3期74-80,共7页Electronic Design Engineering

基  金:四川革命老区发展研究中心重点项目(SLQ2020SA-01,SLQ2021BA-01);四川文理学院教改项目(2020JZ016,2020JZ001)。

摘  要:针对大数据时代的高维数据重复检测的去重问题,通过借助聚类的特性,采用一种聚类更为紧凑的NSKSA构建R-树,使空间索引结构更优,降低了访问空间节点的次数。采用改进的ADDR算法提高多维数据下重复检测的效率。通过实验发现,NSKSA比DKSC、TGS算法构建R-树更为紧凑,从而使得改进的ADDR算法重复检测率比DDR提高近5%。实验结果表明,提出的NSKSA和ADDR算法能够有效地提高多维数据的重复检查率。Aiming at the problem of duplicate detection of high-dimensional data in the era of big data,by virtue of the characteristics of clustering,an NSKSA method with more compact clustering is used to construct R-tree,which makes the spatial index structure better and reduces the number of visits to spatial nodes. The improved ADDR algorithm is used to improve the efficiency of duplicate detection under high-dimensional data. Through the experiment found that NSKSA is more compact than DKSC and TGS algorithms in constructing R-tree,so that the repeated detection rate of the improved ADDR algorithm can be improved by nearly 5% compared with DDR. Experimental results show that the proposed NSKSA and ADDR algorithm can effectively improve the repeated inspection rate of multidimensional data.

关 键 词:聚类 R-树 重复检测 高维数据 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象