检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《山东大学学报(理学版)》2014年第9期62-68,82,共8页Journal of Shandong University(Natural Science)
基 金:北京市自然科学基金资助项目(9142002);北京市教育委员会科技计划面上项目(KM201410028020)
摘 要:针对网络舆情分析领域,研究了系统聚类、String Kernels、K最近邻算法(K-nearest neighbor,KNN)、SVM(support vector machine)算法以及主题模型5种聚类算法。以网络舆情数据为对象集,以R语言环境为实验工具,比较了这5种算法的优势与劣势,同时进行了仿真实验。实验结果表明,主题模型相对于其他算法在文本聚类方面具有更好的适用性,其中,主题模型中的CTM(correlated topic model)方法更适合于类别关系的探索与发现,而Gibbs抽样方法则在文本聚类上的表现优于CTM方法。According to the field of network public opinion analysis,five clustering algorithms:system clustering, string kernels,K nearest neighbor algorithm,support vector machine algorithm and topic models were studied.A com-prehensive comparative research of these five algorithms was conducted by using network public opinion data as data set and R language environment as experimental tool.At the same time,simulation experiments were carried out to com-pare these five algorithms’strengths and weaknesses.Experimental results show that"topic model"has better applica-bility than other algorithms in terms of text clustering.After further experiments we also found in topic models,CTM (Correlated Topic Model)method is more suitable for the exploration and discovery of class relations while Gibbs sam-pling method on the performance of text clustering method is better than the CTM method.
分 类 号:TP309[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222