基于图结构提示实现低资源场景下的节点分类  

Node classification with graph structure prompt in low-resource scenarios

在线阅读下载全文

作  者:陈宇灵 李翔 CHEN Yuling;LI Xiang(School of Data Science&Engineering,East China Normal University,Shanghai 200062,China)

机构地区:[1]华东师范大学数据科学与工程学院,上海200062

出  处:《计算机工程与科学》2025年第3期534-547,共14页Computer Engineering & Science

摘  要:文本属性图时下逐渐成为图研究领域的一个热点。在传统的图神经网络研究中,所使用到的节点特征通常是由文本信息转化来的浅层特征或者是人为手动设计的特征,如跳字模型和连续词袋模型。近年来,随着大型语言模型的出现,自然语言处理方向的研究发生了深刻的变革。这种变革不仅影响了自然语言处理的相关任务,还开始渗透到图神经网络。因此,最近的图工作中也开始引入语言表征模型和大语言模型用于生成新的节点表征,旨在进一步挖掘更加丰富的语义信息。在现有的工作中,大多数模型还是采用传统GNN架构或对比学习的方式。在对比学习一类的方法中,由于传统节点特征和语言模型生成的节点表征并不是由统一的模型生成的,因此这类方法面临着处理2个位于不同向量空间的向量的挑战。基于以上的挑战和考量,提出一个名为GRASS的模型。具体来说,模型在预训练任务中引入了通过大语言模型扩充得到的文本信息,对其与经过图卷积的文本信息进行对比学习;在下游任务中,为了减少微调的成本,GRASS对齐了下游任务和预训练任务的形式。通过这个模型,使得GRASS在不需要微调的情况下,能够在节点分类任务上表现良好,尤其是在小样本场景下。例如,在1-shot场景下,比起最优基准模型,GRASS在Cora,Pubmed和ogbn-arxiv数据集上的准确度分别提升了6.10%,6.22%和5.21%。Text-attribute graph has increasingly become a hotspot in the field of graph research.In traditional graph neural network(GNN)research,the node features used are typically shallow features derived from text information or manually designed features,such as those from the skip-gram and continuous bag of words(CBOW)models.In recent years,with the advent of large language models(LLMs),profound changes have taken place in the direction of natural language processing(NLP).These changes have not only impacted NLP tasks but have also begun to permeate into GNNs.Consequently,recent graph-related work has started to introduce language representation models and large language models to generate new node characterization,aiming to further mine richer semantic information.In existing work,most models still adopt traditional GNN architectures or contrastive learning approaches.In the category of contrastive learning methods,since traditional node features and node characterization generated by language models are not produced by a unified model,they face the challenge of dealing with two vectors located in different vector spaces.Based on these challenges and considerations,a model named GRASS is proposed.Specifically,in the pre-training task,the model introduces text information expanded by large language models,which is used for contrastive learning with textual information processed by graph convolution.In downstream tasks,to reduce the cost of fine-tuning,GRASS aligns the formats of downstream tasks with those of pre-training tasks.Through this model,GRASS can perform well on node classification tasks without the need for fine-tuning,especially in low-shot scenarios.For example,in the 1-shot scenario,compared with the best baseline,GRASS improves by 6.10%,6.22%,and 5.21%on the Cora,Pubmed,and ogbn-arxiv datasets,respectively.

关 键 词:图神经网络 文本属性图 大语言模型 对比学习 预训练 提示学习 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象