基于BERT-MLDFA的内容相近类目自动分类研究--以《中图法》E271和E712.51为例  被引量:1

Automatic Classification Research of Similar Categories Based on BERT-MLDFA:Take E271 and E712.51 in CLC as an Example

在线阅读下载全文

作  者:李湘东[1,2] 石健 孙倩茹 贺超城 LI XiangDong;SHI Jian;SUN QianRu;HE ChaoCheng(School of Information Management,Wuhan University,Wuhan 430072,P.R.China;Center for Electronic Commerce Research and Development at Wuhan University,Wuhan 430072,P.R.China)

机构地区:[1]武汉大学信息管理学院,武汉430072 [2]武汉大学电子商务研究与发展中心,武汉430072

出  处:《数字图书馆论坛》2022年第2期18-25,共8页Digital Library Forum

基  金:武汉大学青年研究中心调研课题“高校大学生‘内卷’机制的建模与仿真研究”(编号:20210407)资助。

摘  要:针对《中图法》中具有关联度大、区分度小等特点的内容相近类目,探讨利用深度学习来提升分类效果的方法。本文构建BERT-MLDFA模型,即通过多层级注意力机制对BERT不同层参数进行动态融合,并在任务数据集上预训练,进而以《中图法》中E271和E712.51作为典型内容相近类目进行自动分类实验。结果表明:本文方法的Macro_F1值达到0.987,相较于经典机器学习方法提升2.4%,而且该方法可以捕捉内容相近类目文本之间的细微语义差别,能够较好地应用于《中图法》以及其他内容相近类目分类,具有较强普适性。This paper discusses the method of using deep learning method to improve the classification performance of the similar categories in Chinese Library Classification which have the features of high correlation degree and low differentiation degree.This paper proposes a BERT-MLDFA model that dynamically integrates parameters of different BERT layers through multi-level attention mechanism,and further pretrains on task datasets.Then,to conduct automated classification experiments,E271 and E712.51 in Chinese Library Classification were used as typical similar categories.The results show that the Macro_F1 value of the proposed method reaches 0.987,which is 2.4%higher than that of the classical machine learning method.The method proposed in this paper can capture the subtle semantic differences between texts of similar categories,which can be applied to Chinese Library Classification and other similar categories and is universal.

关 键 词:《中图法》 深度学习 BERT 自动分类 

分 类 号:G250.7[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象