检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱峥瑜 宋燕[1] ZHU Zheng-yu;SONG Yan(Control Science and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093
出 处:《小型微型计算机系统》2021年第12期2545-2552,共8页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(62073223)资助;上海市自然科学基金项目(18ZR1427100)资助。
摘 要:在当今信息爆炸的大数据时代,不完全数据是数据聚类分析中一个普遍存在的问题.然而,传统模糊C均值(fuzzy c means,FCM)算法的很多缺点,如易陷入局部最优,缺乏对特征信息的充分考虑等,当出现信息缺失尤其是面对稀疏数据时,都将严重影响聚类结果.为了解决该问题,本文提出一种基于多重信息的不完全数据的FCM算法.该算法首先引入部分距离策略,给出了不完全数据的簇内距离平方和计算公式;其次,充分利用动态特征权重和簇间距离信息,有效地提高该算法的准确性;再者,运用粒子群优化算法进行聚类,借助其强大的全局寻优能力解决传统FCM算法对初始聚类中心敏感和容易陷入局部最优的缺陷;最后,通过不同缺失率UCI公共数据集的对比实验,验证了本文提出算法在不完全数据的聚类研究中不仅能避免陷入局部最优还能有效提高聚类准确性.In this information explosion era,incomplete data are ubiquitousin data clustering. However,the traditional fuzzy c means( FCM) algorithm has many drawbacks,such as easy to fall into the local optimum,inadequate consideration of feature information,etc.,and these drawbacks might seriously deteriorate the clustering accuracy in the presence of missing data especially for datasets with a high sparsity. In order to address this problem,a novel FCM algorithm based on multiple information is proposedfor incomplete data.Firstly,the partial distance strategy is adopted to calculate the square sum of intra-cluster distances. Secondly,the dynamic feature weights and the inter-cluster distances are fully taken into consideration to improve the algorithm accuracy. Thirdly,a so-called particle swarm optimization algorithm is used to fulfill the clustering,which efficiently overcomes the sensitivity of the traditional FCM to the cluster center initiation while avoiding being trapped into local optimum,due to its great capability of finding global optimum. Finally,the comparison experiments of UCI public datasets with different missing rates show that the proposed algorithm can avoid falling into local optimum which effectively improving the clustering accuracy.
关 键 词:不完全数据 簇间距离 动态特征权重 模糊C均值聚类 粒子群优化算法
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.4