面向网络舆情的文本知识发现算法对比研究  被引量:1

Comparative research on text knowledge discovery for network public opinion

在线阅读下载全文

作  者:焦潞林 彭岩[1] 林云[1] 

机构地区:[1]首都师范大学管理学院,北京100048

出  处:《山东大学学报(理学版)》2014年第9期62-68,82,共8页Journal of Shandong University(Natural Science)

基  金:北京市自然科学基金资助项目(9142002);北京市教育委员会科技计划面上项目(KM201410028020)

摘  要:针对网络舆情分析领域,研究了系统聚类、String Kernels、K最近邻算法(K-nearest neighbor,KNN)、SVM(support vector machine)算法以及主题模型5种聚类算法。以网络舆情数据为对象集,以R语言环境为实验工具,比较了这5种算法的优势与劣势,同时进行了仿真实验。实验结果表明,主题模型相对于其他算法在文本聚类方面具有更好的适用性,其中,主题模型中的CTM(correlated topic model)方法更适合于类别关系的探索与发现,而Gibbs抽样方法则在文本聚类上的表现优于CTM方法。According to the field of network public opinion analysis,five clustering algorithms:system clustering, string kernels,K nearest neighbor algorithm,support vector machine algorithm and topic models were studied.A com-prehensive comparative research of these five algorithms was conducted by using network public opinion data as data set and R language environment as experimental tool.At the same time,simulation experiments were carried out to com-pare these five algorithms’strengths and weaknesses.Experimental results show that"topic model"has better applica-bility than other algorithms in terms of text clustering.After further experiments we also found in topic models,CTM (Correlated Topic Model)method is more suitable for the exploration and discovery of class relations while Gibbs sam-pling method on the performance of text clustering method is better than the CTM method.

关 键 词:主题模型 文本知识发现 文本聚类 网络舆情 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象