检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西北师范大学计算机科学与工程学院,兰州730070
出 处:《计算机应用》2013年第1期202-206,共5页journal of Computer Applications
基 金:甘肃省科技支援计划项目(090GKCA075);2012年度教育部人文社会科学研究项目(12YJCZH282)
摘 要:传统数据流聚类算法大多基于距离或密度,聚类质量和处理效率都不高。针对以上问题,提出了一种基于关联函数的数据流聚类算法。首先,将数据点以物元的形式模型化,建立解决问题所需要的关联函数;其次,计算关联函数的值,以此值的大小来判断数据点属于某簇的程度;然后,将所提方法运用到数据流聚类的在线离线框架中;最后,采用真实数据集KDD-CUP99和随机生成的人工数据集进行算法的测试。实验结果表明,所提方法的聚类纯度在92%以上,每秒能处理约6300条记录,与传统算法相比,处理效率有了较大的提高,在维度和簇数目方面的可扩展性较强,适用于处理大规模的动态数据集。The traditional data stream clustering algorithms are mostly based on distance or density, so their clustering quality and processing efficiency are weak. To address the above problems, this paper proposed a data stream clustering algorithm based on dependent function. Firstly, the data points were modeled in the form of matter-element and dependent function was established to solve the problem. Secondly, the value of the dependent function was calculated. According to this value, the degree that data point belongs to a certain cluster was judged. Then, the proposed method was applied to online- offline framework of the data stream clustering. Finally, the proposed algorithm was tested by using the real data set KDD- CUP99 and randomly generated artificial data sets. The experimental results show that clustering purity of the proposed method is over 92%, and it can deal with about 6 300 records per second. Compared with the traditional algorithm, the processing efficiency of the algorithm is greatly improved. In the aspects of dimension and the number of cluster, the algorithm shows stronger scalability, and it is suitable for processing large dynamic data set.
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145