基于图对比学习的长文本分类模型  

A Long Text Classification Model Based on Graph Contrast Learning

在线阅读下载全文

作  者:刘宇昊 高榕 严灵毓[1] 叶志伟[1] LIU Yuhao;GAO Rong;YAN Lingyu;YE Zhiwei(School of Computer Science,Hubei Univ.of Tech.,Wuhan 430068,China)

机构地区:[1]湖北工业大学计算机学院,湖北武汉430068

出  处:《湖北工业大学学报》2023年第5期67-74,共8页Journal of Hubei University of Technology

摘  要:当前基于字符级考虑的文本分类方法在长文本分类上,存在输入维度过大致使计算困难以及内容过长难以捕捉长距离关系,从而导致准确度不足的问题。由此,提出基于自适应视图生成器和负采样优化的图对比学习长文本分类模型。首先将长文本分为若干段落,用BERT衍生模型对段落进行嵌入表示,然后基于文本的高级结构将段落的嵌入表示视为节点构建图模型,接着使用自适应视图生成器对图进行增广,并通过图对比学习得到文本的嵌入表示,同时在图对比学习的负采样阶段,引入PU Learning知识修正负采样偏差的问题,最后将得到的文本嵌入表示使用两层线性层进行分类。通过在两个中文数据集上的实验显示,方法优于主流先进模型。The current text classification methods based on character-level consideration have the problems of computational difficulty due to the large input dimension and the difficulty of capturing the long-distance relationship due to the long content,which leads to a lack of accuracy in long text classification.Thus,the proposed graph contrast learning long text classification model is based on an adaptive view generator and negative sampling optimization.Specifically,the long text is first divided into several paragraphs,and the paragraphs are embedded with the BERT-derived model,then the graph model is constructed based on the high-level structure of the text by considering the embedded representation of the paragraphs as nodes,then the graph is augmented using the adaptive view generator,and the embedded representation of the text is obtained by graph contrast learning,while PU learning knowledge is introduced to alleviate the problem of negative sampling bias in the negative sampling phase of graph contrast learning,and finally the obtained embedded representation of the text is classified using two linear layers.Experiments on two Chinese datasets show that the method outperforms mainstream advanced models.

关 键 词:文本表示 长文本分类 图对比学习 负采样 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象