检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广东石油化工学院实验教学部计算机中心,广东茂名525000 [2]广东石油化工学院计算机与电子信息学院,广东茂名525000
出 处:《计算机应用与软件》2014年第1期105-107,139,共4页Computer Applications and Software
基 金:国家自然科学基金项目(60903168);广东省教育部产学研结合项目(2010B090400235);茂名市科技计划项目(2011008)
摘 要:微博具有数量多、字数少、话题广泛等特点,导致数据中孤立点较多,对微博热点话题聚类算法产生不利影响,为此,提出一种消除孤立点的微博热点话题发现方法。首先消除数据集中的孤立点,然后采用CURE(Clustering Using Representatives)算法对剩余有聚类价值的数据进行聚类,最后通过实例验证算法的有效性。结果表明,相对于对比聚类算法,该算法降低聚类结果对孤立点的敏感度,提高了微博热点话题发现的准确性,并提高了算法的运行效率,更适合应用于大规模的微博热点话题发现。Microblogging has the characteristics of large number, fewer words and wide range of topics, these lead to quite a few isolated points (outliers) in microblogging data which have adverse effect on clustering algorithm of microblogging hot topics. Therefore, we propose a microblogging topic discovery method which is based on outliers elimination. First, the outliers are removed from dataset, and then the CURE algorithm is used to cluster those data remained and having clustering value, finally the validity of the algorithm is verified by examples. Results show that, compared with contrastive clustering algorithm, the proposed algorithm reduces the sensitivity of clustering result on outliers, improves the accuracy of microblogging hot topics discovery, and raises the operation efficiency of the algorithm, it is more suitable for applying in large-scale microblogging hot topics discovery.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.132.108