基于深度学习技术的科技文献引文分类研究综述  被引量:1

Review of Automatic Citation Classification Based on Deep Learning Technology

在线阅读下载全文

作  者:李俊飞 徐黎明 汪洋[1,2] 魏鑫 LI JunFei;XU LiMing;WANG Yang;WEI Xin(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院计算机网络信息中心,北京100083 [2]中国科学院大学,计算机科学与技术学院,北京100049

出  处:《数据与计算发展前沿》2023年第4期86-100,共15页Frontiers of Data & Computing

基  金:中国科学院态势感知运行维护与应用支持项目(WX1450201-0105-02)。

摘  要:【目的】科技文献引文分类是学术影响力评估、文献检索推荐等的基础工作。随着深度神经网络和预训练语言模型的发展,科技文献引文分类研究取得巨大成果。学界提出了许多基于深度学习技术的科技文献引文分类方法、模型和数据集。然而,目前仍然缺乏对现有方法和最新趋势的全面调研,因此本文在这方面进行了探索。【方法】本文梳理了基于深度学习技术的科技文献引文分类模型、数据集,并对不同模型的分类性能进行了对比和分析;归纳了不同模型的优缺点,对科技文献引文分类技术进行总结;讨论了未来的发展方向,并提出了建议。【结果】预训练语言模型能够有效地学习全局语义表示,改善了RNNs(Recurrent Neural Networks)训练效率低、CNNs(Convolutional Neural Networks)提取文本序列依赖特征长度有限等问题,显著提高了分类准确率。【局限】本文以介绍科技文献引文分类技术的进展为主,没有对未来技术的发展方向进行全面预测。[Objective]The citation classification of scientific and technological literature is the basic work of academic influence evaluation and literature retrieval and recommendation.With the development of deep neural networks and pre-trained language models,the research on citation classification of scientific and technological literature has achieved great success.Many citation classification models,data sets,and methods for scientific and technological documents based on deep learning technology have been proposed in the literature.However,there is still a lack of comprehensive research on existing methods and the latest trends.This paper makes up for this gap.[Methods]This paper studies the citation classification model and data set of scientific and technological literature based on deep learning technology, compares and analyzes the performance of different models as well as their advantages and disadvantages, summarizes the citation classification technology for scientific and technological literacy, and discusses the future development direction. [Results] The classification model based on the pre-trained language model can effectively learn the global semantic representation, improve the problems of low training efficiency of RNNs (Recurrent Neural Networks) and limited length of dependent features of text sequences extracted by CNNs (Convolutional Neural Networks), and significantly improve the classification accuracy. [Limitations] This paper mainly introduces the progress of citation classification technology in scientific and technological literature, and does not comprehensively predict the development direction of technology in the future.

关 键 词:科技文献引文分类 预训练语言模型 深度学习 自然语言处理 

分 类 号:G254.1[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象