基于网络舆情的K-Means算法的改进研究  被引量:3

The Improvement of K-Means Clustering Algorithm based on Internet Public Opinion

在线阅读下载全文

作  者:罗晖霞[1] 曲晓玲 

机构地区:[1]中北大学,太原030051 [2]山西省政府办公厅,太原030002

出  处:《电脑开发与应用》2010年第8期4-6,15,共4页Computer Development & Applications

基  金:山西人事厅资助项目(SX20090108-07)

摘  要:传统的K-Means聚类算法只能保证收敛到局部最优,从而导致聚类结果对初始代表点的选择非常敏感;凝聚层次聚类虽无需选择初始的聚类中心,但计算复杂度较高,而且凝聚过程不可逆。结合网络舆情的特点,深入剖析了K-Means聚类算法和凝聚层次聚类算法的优缺点,对K-Means聚类算法进行改进。改进后算法的核心思想是,结合两种算法分别在初始点选择和聚类过程两个方面的优势,进行整合优化。通过实验分析及实际应用表明,改进后的文本聚类算法在很大程度上可以提高网络舆情信息聚类结果的准确性、有效性以及算法的效率。The traditional K-Means clustering algorithm can only ensure the convergence to a local optimum,leading to the initial clustering results are very sensitive to the choice of representative points.Agglomerative hierarchical clustering option to eliminate the initial cluster centers can be automatically generated for text set at different levels of clustering model,but it is higher in computational complexity,and irreversible aggregation.In this article,analysis deeply the advantages and disadvantages of the K-Means clustering algorithm and agglomerative hierarchical clustering algorithm according to the network characteristics of public opinion,and improving the K-Means clustering algorithm.The core idea of the improved algorithm is combining the advantages of two algorithms at the initial point selection and clustering processes,making integration optimization.Through practical application shows that the improved algorithm can improve the quality and efficiency of the network public opinion information and clustering results.

关 键 词:网络舆情 文本聚类 K-MEANS算法 凝聚层次聚类 聚类过程 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象