基于深度学习的文献数字资源智能分类标引研究  

Research on Intelligent Classification and Indexing of Document Digital Resources Based on Deep Learning

在线阅读下载全文

作  者:王静[1] 姜鹏 沈立力[1] Wang Jing;Jiang Peng;Shen Lili(Shanghai Library(Shanghai Institute of Science and Technology Information),Shanghai 200031,China)

机构地区:[1]上海图书馆上海科学技术情报研究所,上海200031

出  处:《图书情报研究》2023年第4期43-48,64,共7页Library and Information Studies

基  金:上海图书馆青年杨帆计划专项“基于深度学习的文献数字资源智能分类标引研究与应用”的研究成果之一。

摘  要:[目的/意义]研究并构建基于深度学习的智能分类标引系统,并对文献数字资源进行正确的分类标引,以期降低文献分类标引过程中的人工成本。[方法/过程]首先,通过对比分析BERT-Base模型、贝叶斯算法、Text-CNN算法、对抗训练算法、IndRNN算法、LSTM算法这6种模型或算法对经济类文献数字资源分类的影响,发现BERT-Base模型的分类准确率最高。其次,选取艺术类、金属学与金属工艺类、医药卫生类的文献数字资源进行验证,BERT-Base模型的分类表现均较好,满足通用性要求。最后,采用BERT-Base中文预训练模型,构建文献数字资源一级大类分类模型,对模型进行预训练和文献分类研究,实现了一级大类分类测试总体准确率为90.44%。[结果/结论]基于BERT-Base中文预训练模型的深度学习算法能显著提高文献数字资源的分类效果,且在多类目大规模训练集下更能体现其分类的优越性。[Purpose/significance]This paper studies and constructs the intelligent classification and indexing system based on deep learning,and makes correct classification and indexing to reduce the manual cost in the process of document classification and indexing.[Method/process]Firstly,by comparing and analyzing the influence of six algorithms including BERTBase model on the classification of economic literature digital resources,we find that it has the highest classification accuracy.Secondly,art,metallurgy,metal technology,medicine and health are selected to verify the classification performance of BERT-Base model,which is proved good and can meet the general requirements.Finally,the first-level classification model is constructed based on BERT-Base Chinese pre-training model. It is pre-trained and the literature classification is studied. The overall accuracy of the first-level classification test based on the BERT-Base Chinese pre-training model is 90.44%. [Result/ conclusion] Therefore, the deep learning algorithm based on BERT-Base Chinese pre-training model can improve the classification effect of document digital resources, and it can embody the superiority of classification under the large-scale multi-category training set.

关 键 词:深度学习 BERT 文献分类 数字资源 

分 类 号:G254.1[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象