检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘仁芬 杨凤丽 王霞 LIU Ren-fen;YANG Feng-li;WANG Xia(Shijiazhuang Tiedao University Sifang College,Shijiazhuang Hebei 051132,China)
机构地区:[1]石家庄铁道大学四方学院,河北石家庄051132
出 处:《计算机仿真》2022年第12期383-386,444,共5页Computer Simulation
基 金:分布式数据库隐私信息增量式更新方法仿真(2017ZY 0725)。
摘 要:已有数据增量式聚类算法忽略了数据的降维过程,导致算法无法聚类处理属性较多的高维数据。现提出基于改进Spark技术的高维数据增量式聚类算法。基于混沌分区方法重组高维数据结构,获取模糊数据分布轨迹。采用基于信息熵的高维稀疏降维算法,筛选分布空间中的高维数据特征,完成数据降维。改进Spark技术,设计并行化增量式高维数据聚类优化算法,检测降维后数据特征之间的关联性,并融合数据特征,确定聚类中心后完成高维数据增量式聚类。测试结果表明,高维数据的嵌入维数为7时,算法的重组效果较好,有效实现数据集的维度下降,降低了存储空间的占用率,可完成高维数据的有效、可靠聚类。Currently, the dimensionality reduction process of data is often neglected in some incremental clustering algorithms, so it’s impossible to cluster high-dimensional data with more attributes. In this article, an incremental clustering algorithm for high-dimensional data based on improved Spark technology was put forward. According to the chaotic partition algorithm, we reconstructed the high-dimensional data structure and thus to obtain the trajectories of fuzzy data distribution. Then, we used the algorithm of reducing high-dimensional sparse data dimension based on information entropy to filter the high-dimensional data features in distribution space and thus to complete the data dimension reduction. After improving the Spark technology, we designed a parallel incremental algorithm for high-dimensional data clustering optimization and detected the correlation between the data features after reduction. Meanwhile, we fused these data features and determined the clustering center. After that, we completed the incremental clustering of high-dimensional data. Test results prove that when the embedding dimension of high-dimensional data is seven, the reorganization effect of the algorithm is better than before. In addition, the dimensionality of the dataset is effectively reduced. The occupancy rate of storage space is also reduced. Therefore, this algorithm can achieve the effective and reliable clustering of high-dimensional data.
关 键 词:高维数据 增量式聚类 数据降维 结构重组 增量比例
分 类 号:TP319[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.7.172