基于深度学习和规范术语库的学术论文关键词抽取研究  

Extraction of Keywords from Academic Papers Based on Deep Learning and Terminology Bank

在线阅读下载全文

作  者:陈若愚[1,2] 李焱 吴卓 杜振雷[3] CHEN Ruoyu;LI Yan;WU Zhuo;DU Zhenlei(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100192,China;Computer School,Beijing Information Science and Technology University,Beijing 102206,China;China National Committee for Terminology in Science and Technology,Beijing 100717,China)

机构地区:[1]北京信息科技大学智能信息处理研究所,北京100192 [2]北京信息科技大学计算机学院,北京102206 [3]全国科学技术名词审定委员会,北京100717

出  处:《昆明理工大学学报(自然科学版)》2024年第6期57-63,共7页Journal of Kunming University of Science and Technology(Natural Science)

基  金:国家语言文字委员会科研项目(YB135-155);北京信息科技大学“勤信人才”培育计划项目(QXTCPC202111).

摘  要:学术论文中的关键词,对于揭示论文主题、提高文献检索的准确性、促进学术交流有着重要的作用.针对学术论文关键词选择不规范的问题,通过网络爬虫采集了部分计算机领域中文学术论文的摘要,基于全国科学技术名词审定委员会审定的规范术语库,标注了学术论文摘要和规范术语的映射数据集.基于这一数据集和深度学习技术,建立学术论文摘要与规范术语之间的匹配模型,从而实现计算机辅助的学术论文关键词抽取.通过实验验证了所提出方法的可行性,同时,通过基于记忆回放的增量训练与评估,验证了模型的增量学习泛化能力.Keywords in academic papers play an important role in revealing the themes of papers,improving the accuracy of literature retrieval and promoting academic communication.To address the problem of non-standard keyword selection in academic papers,the abstracts of some Chinese academic papers in the field of computer science were collected by web crawlers,and the mapping data between the abstracts of academic papers and the standardized terms were annotated based on the terminology bank approved by China National Committee for Terminology in Science and Technology.Using this dataset and deep learning technology,a matching model between the abstracts of academic papers and the standardized terms is established to realize computer-aided keyword extraction of academic papers.The feasibility of the proposed method was verified through experiments.Additionally,through experience replay-based incremental training and evaluation of the data,the incremental generalization capability of the model is experimentally verified.

关 键 词:深度学习 术语库 学术论文 关键词抽取 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象