一种消除孤立点的微博热点话题发现方法  被引量:9

A MICROBLOGGING HOT TOPICS DISCOVERY METHOD BASED ON OUTLIERS ELIMINATION

在线阅读下载全文

作  者:赖锦辉[1] 梁松[2] 

机构地区:[1]广东石油化工学院实验教学部计算机中心,广东茂名525000 [2]广东石油化工学院计算机与电子信息学院,广东茂名525000

出  处:《计算机应用与软件》2014年第1期105-107,139,共4页Computer Applications and Software

基  金:国家自然科学基金项目(60903168);广东省教育部产学研结合项目(2010B090400235);茂名市科技计划项目(2011008)

摘  要:微博具有数量多、字数少、话题广泛等特点,导致数据中孤立点较多,对微博热点话题聚类算法产生不利影响,为此,提出一种消除孤立点的微博热点话题发现方法。首先消除数据集中的孤立点,然后采用CURE(Clustering Using Representatives)算法对剩余有聚类价值的数据进行聚类,最后通过实例验证算法的有效性。结果表明,相对于对比聚类算法,该算法降低聚类结果对孤立点的敏感度,提高了微博热点话题发现的准确性,并提高了算法的运行效率,更适合应用于大规模的微博热点话题发现。Microblogging has the characteristics of large number, fewer words and wide range of topics, these lead to quite a few isolated points (outliers) in microblogging data which have adverse effect on clustering algorithm of microblogging hot topics. Therefore, we propose a microblogging topic discovery method which is based on outliers elimination. First, the outliers are removed from dataset, and then the CURE algorithm is used to cluster those data remained and having clustering value, finally the validity of the algorithm is verified by examples. Results show that, compared with contrastive clustering algorithm, the proposed algorithm reduces the sensitivity of clustering result on outliers, improves the accuracy of microblogging hot topics discovery, and raises the operation efficiency of the algorithm, it is more suitable for applying in large-scale microblogging hot topics discovery.

关 键 词:微博热点话题孤立点 CURE算法 发现 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象