基于LDA模型和HowNet的多粒度子话题划分方法被引量：9

Multi-granularity subtopic division based on LDA and How Net

机构地区：[1]武汉大学信息管理学院,武汉430072 [2]武汉大学信息资源研究中心,武汉430072 [3]武汉大学图书馆,武汉430072

出　　处：《计算机应用研究》2015年第6期1625-1629,共5页Application Research of Computers

摘　　要：针对LDA建模结果较泛化、子话题间文本相似度较高等问题,提出一种基于狄利克雷分配模型(LDA)和知网(How Net)语义词典相结合的多粒度子话题划分方法(MGH-LDA)。首先采用LDA模型对不同新闻源的新闻集合进行初划分,并根据文档贡献度获得相同新闻话题的文档集合;其次在TF-IDF模型基础上获取多粒度粗细特征,作为核心词特征集合来表征新闻文档,采用知网语义词典来计算新闻文档之间的相似度;最后通过single-pass增量聚类算法进行新闻文档的聚类,实现子话题划分。通过在真实新闻数据集上的实验,验证了该方法能有效地提高热点新闻话题子话题划分的准确率。In order to solve the generalization of the latent Dirichlet allocation （LDA） model result and high similarity of documents between subtopics,this paper proposed a new method （called as MGH-LDA） based on LDA model and HowNet se- mantic dictionary to realize muhi-granularity subtopic division. Firstly, the method adopted the LDA model to initially divide the news collection that came from different resources and acquired the document collections of the same topics according to the contribution degree of the documents. Secondly,it obtained the multi-granularity characteristics collections based on the TF-IDF model and represented the news documents with the key words characteristics. Owning to a high similarity that the documents of the subtopics had, the method introduced the calculation method of the word semantic similarity degree and adopted the HowNet semantic dictionary to realize the calculation. Finally, the method realized the subtopics division by clustering the news docu- ments with the single-pass incremental clustering algorithm. The method can improve the accuracy of the hot news sub-topics division effectively by the experiments on the real news data.

关键词：新闻报道子话题划分多粒度狄利克雷分配模型语义相似度计算

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LDA模型和HowNet的多粒度子话题划分方法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LDA模型和HowNet的多粒度子话题划分方法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于LDA模型和HowNet的多粒度子话题划分方法被引量：9