基于动态聚类概念的多维离散数据挖掘研究  

Research on Multidimensional Discrete Data Mining Based on Dynamic Clustering Concept

在线阅读下载全文

作  者:王黎 汪涛 Wang Li;Wang Tao(Luohe Food Engineering Vocational University,Luohe,Henan,462000,China;Henan University of technology Luohe Institute of technology,Luohe,Henan 462000,China)

机构地区:[1]漯河食品工程职业大学,河南漯河462000 [2]河南工业大学漯河工学院,河南漯河462000

出  处:《计算机仿真》2024年第12期565-569,共5页Computer Simulation

摘  要:随着存储数据量的逐步递增,数据库冗余数据大大降低了数据信息的挖掘效率,为解决数据库多维离散数据中存在数据重复度高、交互效率低的问题,本文在DCM动态聚类算法的基础上,通过结合DE-PSO凝聚拓扑寻优算法,构建出了DE-PSO-DCM多维离散数据去重模型。该模型由数据优化处理模块、凝聚拓扑点寻优模块与数据挖掘去重模块构成,其中模块一使用z-score标准法处理离散数据集,解决数据单位不统一的问题,并利用Pearson分析法对多维数据进行降维优化,以降低系统计算量,同时采用区间噪点剔除算法优化构建新的数据集;模块二首先采用GAUSS异变优化DE遗传基因池,并利用遗传定律传承优势基因算子,以提高PSO粒子参数德寻优效率,并通过不断迭代优化构建初始凝聚点;模块三在初始凝聚点参数的基础上,对优化数据样本进行归并处理,并按照重心法则计算优化凝聚点,提高初始聚类有效性,接着基于距离最近计算原则,对分类函数进行二次优化,在类间阈值设定的基础上,完成多维离散数据聚类去重。数据聚类基线去重模型的仿真实验结果显示,与传统聚去重模型相比,DE-PSO-DCM模型的P、R、F1参和T参数分别平均提高了3.43%、2.29%、2.55%和15.80%,即本文所提的动态聚类去重算法具有最高的准确性和鲁棒性,与较高的稳定性与时效性。本文提出的DE-PSO-DCM多维离散数据动态聚类去重算法在数据库空间管理上具有重要的仿真研究价值。With the gradual increase of the amount of data stored,the redundant data in the database greatly reduces the efficiency of data mining.In order to solve the problems of high data repetition and low interaction efficiency in multi-dimensional discrete data in the database,this paper combines DE-PSO agglomerative topology optimization algorithm with DCM dynamic clustering algorithm.DE-PSO-DCM multidimensional discrete data de-duplication model is constructed.The model consists of a data optimization processing module,an aggregation topological point optimization module and a data mining duplication removing module,wherein the first module processes a discrete data set by using a z-score standard method to solve the problem of non-uniform data units,and performs dimension reduction optimization on multidimensional data by using a Pearson analysis method to reduce the calculation amount of the system.At the same time,the interval noise elimination algorithm is used to optimize and construct a new data set.The second module first uses GAUSS mutation to optimize the genetic gene pool,and uses the genetic law to inherit the dominant gene operator to improve the optimization efficiency of PSO particle parameters,and constructs the initial condensation point through continuous iterative optimization;The third module merges the optimized data samples on the basis of the parameters of the initial condensation points,calculates the optimized condensation points according to the law of the center of gravity,improves the validity of the initial clustering,then carries out secondary optimization on the classification function based on the principle of calculating the nearest distance,and completes thede-duplication of multi-dimensional discrete data clustering on the basis of setting the threshold between classes. Thesimulation results of the data clustering baseline de-duplication model show that the P,R,F1 parameters and T parametersof the DE - PSO - DCM model are increased by 3. 43%, 2. 29%, 2. 55% and 15. 80% on average,re

关 键 词:离散多维数据 数据挖掘 动态聚类 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象