基于标签语义匹配融合的多标签文本分类方法  被引量:1

Multi label text classification method based on label semantic matching fusion

在线阅读下载全文

作  者:文勇军[1] 刘随缘 崔志豪 WEN Yongjun;LIU Suiyuan;CUI Zhihao(School of Physical&Electric Science,Changsha University of Science&Technology,Changsha 410114,China)

机构地区:[1]长沙理工大学物理与电子科学学院,湖南长沙410114

出  处:《湘潭大学学报(自然科学版)》2024年第3期82-93,共12页Journal of Xiangtan University(Natural Science Edition)

摘  要:针对当前多标签文本分类研究中存在的文本有效信息提取不充分、标签间的相关性被忽略、文本对标签的语义关注挖掘、利用不足的问题,该文提出了一种基于标签语义匹配融合的多标签文本分类方法.首先,利用DeBERTa模型来计算细粒度为单词级的文本表示;同时,根据标签全局共现构建标签图数据,利用图注意力网络自动学习不同标签之间的关联程度,生成捕获了标签间结构信息与深层相关性的标签特征嵌入;然后,提出了一种基于标签语义匹配的嵌入融合机制建模文本对标签的语义关注,体现了两者的语义关联,并将获得的基于标签语义匹配嵌入的单词融合表示送入CNN中进行特征交互,最终实现标签预测.在AAPD与RCV1-V2这两个公开英文数据集上的实验结果表明,该文所提出的模型性能明显优于其他主流基线模型.In current research on multi label text classification,there are problems such as insufficient extraction of effective information from text,ignored correlation between labels,and insufficient mining and utilization of semantic attention from text to labels.This article proposes a multi label text classification method based on tag semantic matching fusion.Firstly,use the DeBERTa model to calculate fine-grained word level text representations;At the same time,label graph data is constructed based on global co-occurrence of labels,and graph attention networks are used to automatically learn the degree of association between different labels,generating label feature embeddings that capture the structural information and deep correlation between labels.Then,an embedding fusion mechanism based on label semantic matching was proposed to model the semantic attention of text to labels,reflecting the semantic correlation between the two.The obtained word fusion representation based on label semantic matching embedding was fed into CNN for feature interaction,ultimately achieving label prediction.The experimental results on two publicly available English datasets,AAPD and RCV1-V2,show that the performance of the proposed model is significantly superior to other mainstream baseline models.

关 键 词:多标签文本分类 DeBERTa 图注意力网络GAT 标签语义嵌入 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象