基于ALBERT-Seq2Seq-Attention模型的数字化档案多标签分类  

Multi-label Classification of Digital ArchivesBased on ALBERT-Seq2Seq-Attention Model

在线阅读下载全文

作  者:王少阳 成新民[1] 王瑞琴[1] 陈静雯 周阳 费志高 WANG Shaoyang;CHENG Xinmin;WANG Runqin;CHEN Jingwen;ZHOU Yang;FEI Zhigao(School of Information Engineering,Huzhou University,Huzhou 313000,China)

机构地区:[1]湖州师范学院信息工程学院,浙江湖州313000

出  处:《湖州师范学院学报》2024年第2期65-72,共8页Journal of Huzhou University

基  金:国家自然科学基金项目(62277016);湖州师范学院研究生科研创新项目(2022KYCX45)。

摘  要:针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.Aiming at the problem of lack of correlation between classification labels in existing digital archive multi label classification methods,a deep neural network model for archive multi label classification,ALBERT-Seq2Seq-Attention,is proposed.This model uses the multi-layer and bidirectional Transfomer structure acquisition within the ALBERT(A Little BERT)pre training language model to extract text feature vectors and obtain contextual semantic information.Secondly,the text features extracted by pre-training are used as the input sequence of the Seq2Seq-Attention(Sequence to Sequence-Attention)model,and a label dictionary is constructed to obtain the association relationship between multiple labels.Comparative experiments were conducted on three datasets using the classification model.The experimental results showed that the F1 value of the model classification effect exceeded 90%,not only improving the multi label classification effect of archive text,but also paying attention to the correlation between labels.

关 键 词:ALBERT Seq2Seq ATTENTION 多标签分类 数字化档案 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象