LDA与BTM概率主题模型抽取科学主题效果比较研究被引量：11

Comparative Study on the Effect of LDA and BTM Probabilistic Subject Model in Extracting Scientific Subject

作　　者：张文伟赵辉[1] ZHANG Wenwei;ZHAO Hui(Institute of science and technology of China,Beijing 100038,China)

出　　处：《情报工程》2020年第2期66-77,共12页Technology Intelligence Engineering

基　　金：中国科学技术信息研究所创新研究基金MS2020-02。

摘　　要：分析文献主题是挖掘科学脉络的基础,目前存在多种提取文献主题的方法,被学者广泛使用的方法是使用概率主题模型抽取文献的主题。使用不同的算法和不同的语料提取出的主题结果也不同,本文通过计算查全率、查准率和定性分析方法分别比较利用了LDA抽取标题、LDA抽取摘要、BTM抽取标题、BTM抽取摘要的主题效果。本文以纳米材料领域数据为例进行分析,实验结果表明使用摘要做语料提取出的主题颗粒度较小且能够反应文献研究内容的细节,LDA算法在提取摘要主题方面优于BTM算法,BTM算法在提取标题主题方面优于LDA算法。Analyzing the subjects of the literature is the foundation for exploring the scientific context. There are several ways to extract the subjects of the literature, the most common way to extract the subjects of the literature is probabilistic topic models. The results of using different algorithms and different corpora to extract the topic are different. This paper compares the subject effects of using LDA and BTM to extract the title and abstract by calculating the recall rate, precision rate, etc. Taking nanomaterials data as an example, the result shows that the topic particle size of abstract corpus extraction is smaller than that of title, which can reflect the specific content of literature research. Compared to the BTM algorithm, the algorithm of LDA is better in extracting an abstract subject. In contrast, the BTM algorithm is prefered than LDA algorithm in extracting the title subject.

关键词：LDA BTM 主题抽取对比分析

分类号：G350.7[文化科学—情报学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

LDA与BTM概率主题模型抽取科学主题效果比较研究被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

LDA与BTM概率主题模型抽取科学主题效果比较研究 被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

LDA与BTM概率主题模型抽取科学主题效果比较研究被引量：11