基于SL-LDA的领域标签获取方法  被引量:2

Domain Label Acquisition Method Based on SL-LDA Model

在线阅读下载全文

作  者:王胜 张仰森[1,2] 张雯 蒋玉茹[1,2] 张睿[1] WANG Sheng;ZHANG Yang-sen;ZHANG Wen;JIANG Yu-ru;ZHANG Rui(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Laboratory ofNational Economic Security Early Warning Engineering,Beijing100044,China)

机构地区:[1]北京信息科技大学智能信息处理研究所,北京100101 [2]国家经济安全预警工程北京实验室,北京100044

出  处:《计算机科学》2020年第11期95-100,共6页Computer Science

基  金:国家自然科学基金项目(61772081,61602044);科技创新服务能力建设-科研基地建设-北京实验室-国家经济安全预警工程北京实验室项目(PXM2018_014224_000010)。

摘  要:科学技术的发展为文献及学者的管理提出了新的挑战,为解决海量科技文献及学者的自动管理,文中提出了一种基于SL-LDA的领域标签获取方法。在海量科技文献的基础上,分析科技文献数据的分布特点,通过引入科技文献的词频特征构建了SL-LDA主题模型,利用该主题模型对同一学者的科技文献进行"主题-短语"抽取,获得初始领域关键词。接着引入领域体系,对主题模型的抽取结果与体系标签进行向量表征,经过位置特征加权后使用相似度进行体系映射,最终获得学者的领域标签。实验结果表明,在同样的文献数据量下,SL-LDA模型与传统的LDA模型、基于统计的TFIDF算法和基于网络图的Text-Rank算法相比,最终获取的标签词效果更好,准确率更高,F1值也提升到0.572,说明基于SL-LDA的领域标签抽取方法在学术领域具有较好的适用性。The development of science and technology poses new challenges for the management of literature and scholars.In order to solve the problem of automatic management of massive scientific literature and scholars,this paper proposes a domain label acquisition method based on SL-LDA.On the basis of massive scientific literature,the distribution characteristics of scientific litera ture data are analyzed,and the SL-LDA theme model is constructed by introducing the word frequency feature of scientific literature.The theme model is used to extract the “theme-phrase” from the scientific literature of the same scholar and get the initial domain keywords.Then the domain system is introduced,the extraction results of the theme model are vector-represented with the system label.After the position feature weighting,the similarity is used for system mapping.Finally,the domain label of the scholar is obtained.Experiment results show that,compared withthe traditional LDA model,the statistical-based TFIDF algorithm and the TextRank algorithm based on network graph,the final label words obtained by SL-LDA model have better effect and higher accuracy with the same amount of literature data,and the F1 value is also raised to 0.572,indicating that the domain label acquisition method based on SL-LDA has good applicability in the academic field.

关 键 词:领域标签 SL-LDA模型 标签映射 主题短语抽取 科技文献 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象