检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王静[1] 姜鹏 沈立力[1] Wang Jing;Jiang Peng;Shen Lili(Shanghai Library(Shanghai Institute of Science and Technology Information),Shanghai 200031,China)
机构地区:[1]上海图书馆上海科学技术情报研究所,上海200031
出 处:《图书情报研究》2023年第4期43-48,64,共7页Library and Information Studies
基 金:上海图书馆青年杨帆计划专项“基于深度学习的文献数字资源智能分类标引研究与应用”的研究成果之一。
摘 要:[目的/意义]研究并构建基于深度学习的智能分类标引系统,并对文献数字资源进行正确的分类标引,以期降低文献分类标引过程中的人工成本。[方法/过程]首先,通过对比分析BERT-Base模型、贝叶斯算法、Text-CNN算法、对抗训练算法、IndRNN算法、LSTM算法这6种模型或算法对经济类文献数字资源分类的影响,发现BERT-Base模型的分类准确率最高。其次,选取艺术类、金属学与金属工艺类、医药卫生类的文献数字资源进行验证,BERT-Base模型的分类表现均较好,满足通用性要求。最后,采用BERT-Base中文预训练模型,构建文献数字资源一级大类分类模型,对模型进行预训练和文献分类研究,实现了一级大类分类测试总体准确率为90.44%。[结果/结论]基于BERT-Base中文预训练模型的深度学习算法能显著提高文献数字资源的分类效果,且在多类目大规模训练集下更能体现其分类的优越性。[Purpose/significance]This paper studies and constructs the intelligent classification and indexing system based on deep learning,and makes correct classification and indexing to reduce the manual cost in the process of document classification and indexing.[Method/process]Firstly,by comparing and analyzing the influence of six algorithms including BERTBase model on the classification of economic literature digital resources,we find that it has the highest classification accuracy.Secondly,art,metallurgy,metal technology,medicine and health are selected to verify the classification performance of BERT-Base model,which is proved good and can meet the general requirements.Finally,the first-level classification model is constructed based on BERT-Base Chinese pre-training model. It is pre-trained and the literature classification is studied. The overall accuracy of the first-level classification test based on the BERT-Base Chinese pre-training model is 90.44%. [Result/ conclusion] Therefore, the deep learning algorithm based on BERT-Base Chinese pre-training model can improve the classification effect of document digital resources, and it can embody the superiority of classification under the large-scale multi-category training set.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200