检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《太原师范学院学报(自然科学版)》2017年第1期46-52,共7页Journal of Taiyuan Normal University:Natural Science Edition
基 金:国家自然基金项目"用户行为数据"稀疏表示的理论与方法研究(61273294);山西省科技攻关计划项目高速网络入侵检测技术研究(20110321024-02)
摘 要:针对谱聚类算法在处理较大规模的样本时,在存储空间和计算时间上都存在瓶颈问题,文章分析了目前常见的两种解决方式,即基于稀疏化t近邻的谱聚类和基于Nystr9m矩阵低秩逼近的谱聚类方法.为了进一步提高这两种谱聚类算法的准确度,提出了采取基于信息熵属性赋权的欧式距离来计算样本间的相似度的方法.首先,计算样本各属性的权重;然后,计算样本间的相似度矩阵并应用到稀疏化t近邻的谱聚类和Nystr9m矩阵低秩逼近的谱聚类方法中;最后,在多个数据集上进行了验证.实验结果表明该方法在一些数据集上的聚类精确度要比原来谱聚类算法高,尤其在Pendigits数据集上,基于信息熵赋权的稀疏化t近邻谱聚类比稀疏化t近邻谱聚类方法精确度提高15.11%.tional time when Spectral clustering suffers from a problem in both storage space and computa- a data set is larger. The current common two kinds of solutions were analyzed, one was based on sparse t-nearest-neighbor and another was based on Nystrom low-rank approxi- mation. In order to improve the accuracy of these two solutions, a method was proposed, in which the similarity between samples was calculated by the Euclidean distance based on informa- tion entropy attributes weighting. Firstly, the weight of each attribute of samples was valued. Secondly, the sample similarity matrix was computed and applied to the spectral clustering based on sparse t-nearest-neighbor and Nystrom low-rank approximation. Finally, its validation was performed on multiple data sets. Experimental results show that our algorithms outperform the original spectral clustering algorithm on some data sets in terms of accuracy. Specifically on the Pendigits, the accuracy of the spectral clustering based on information entropy attributes weigh- ting sparse t-nearest-neighbor increases by 15.11% compared with the spectral clustering based on sparse t-nearest-neighbor.
关 键 词:谱聚类 信息熵 稀疏化t近邻 Nystrm矩阵低秩逼近
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15