领域文献深层语义特征视角下的期刊新兴研究主题发现  被引量:6

Emergent Research Topic Discovery in Journals from the Perspective of Deep Semantic Features of Domain Literature

在线阅读下载全文

作  者:刘江峰 王希羽 张君冬 孔玲[5] 裴雷[1,2] 王东波 Liu Jiangfeng

机构地区:[1]南京大学信息管理学院,江苏南京210023 [2]南京大学数据智能与交叉创新实验室,江苏南京210023 [3]南京农业大学信息管理学院,江苏南京210095 [4]南京农业大学人文与社会计算研究中心,江苏南京210095 [5]山东理工大学信息管理学院,山东淄博255049

出  处:《情报理论与实践》2024年第3期177-187,共11页Information Studies:Theory & Application

摘  要:[目的/意义]从文献深层语义特征角度分析特定领域期刊研究内容中蕴含的新兴主题,对科研工作者了解领域研究热点、寻找进一步研究的方向具有重要意义。[方法/过程]首先,以图书情报领域期刊“JASIST”为例,从文献句子的语义特征角度出发,使用BERT及其衍生模型进行关键句子的识别;其次,基于MLM提出语言模型的增强方案;最后,使用BERTopic在识别结果的基础上进行面向关键研究语句和摘要的新兴主题发掘及演化分析。[结果/结论]整体句子识别性能F1值超80%,基于MLM的领域模型在关键句子识别上较基准模型性能提升约1~2个百分点,基于BERTopic发现7个新兴热点研究主题。文章提出的关键句子识别和基于BERTopic的主题计算方案能够有效挖掘新兴主题,为科研工作者提供辅助。[Purpose/significance] Analyzing the emerging themes embedded in the research content of journals in specific fields from the perspective of deep semantic features of literature is of great significance for researchers to understand the hotspots of research in the field and find the direction of further research.[Method/process] Firstly,we take “JASIST”,a journal in the field of library intelligence,as an example,and use BERT and its derivatives to recognize key research sentences from the perspective of semantic features of literature sentences;then we propose an enhancement scheme of the language model based on MLM;finally,we use BERTopic to carry out emergent theme discovery and evolution analysis of key research statements and abstracts on the basis of the recognition results.[Result/conclusion] The overall sentence recognition performance has an F1 value of more than 80%,the MLM-based domain model improves the performance of key research sentence recognition by about 1~2 percentage points compared with the baseline model,and 7 emerging hot research topics are discovered based on BERTopic.The key research sentence recognition and BERTopic-based topic computation scheme proposed in this paper can effectively mine emerging topics and provide assistance to researchers.

关 键 词:预训练语言模型 掩码语言模型 主题计算 BERTopic 

分 类 号:G255.2[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象