基于异构图神经网络的半监督网站主题分类  

Semi-supervised website topic classification based on hetero-geneous graph neural network

在线阅读下载全文

作  者:王谢中 陈旭 景永俊[1] 王叔洋[2] WANG Xie-zhong;CHEN Xu;JING Yong-jun;WANG Shu-yang(School of Computer Science and Engineering,North Minzu University,Yinchuan 750000;School of Electrical and Information Engineering,North Minzu University,Yinchuan 750000,China)

机构地区:[1]北方民族大学计算机科学与工程学院,宁夏银川750000 [2]北方民族大学电气信息工程学院,宁夏银川750000

出  处:《计算机工程与科学》2024年第4期635-646,共12页Computer Engineering & Science

基  金:宁夏回族自治区重点研发项目(2023BDE02017);北方民族大学中央高校基本科研业务费专项资金(2022PT_S04)。

摘  要:互联网网站数量快速增长使现有方法难以准确分类特定网站主题,如基于URL的方法无法处理未反映在URL中的主题信息,基于网页内容的方法受到数据稀疏性和语义关系捕捉的限制。为此,提出一种基于异构图神经网络的半监督网站主题分类方法HGNN-SWT。该方法不仅利用网站文本特征来弥补仅使用URL特征的不足,还利用异构图对网站文本和词语的稀疏关系进行建模,通过处理图中的节点和边关系来提高分类性能。同时引入基于随机游走的邻居节点采样方法,考虑节点的局部特征和全局图结构,并提出特征融合策略,捕捉网站文本数据的上下文关系和特征交互。通过在自制的Chinaz Website数据集上的实验,证明了HGNN-SWT方法在网站主题分类任务中相较于现有方法具有更高的准确率。The rapid growth of the number of Internet websites has made existing methods challenging to accurately classify specific website topics.URL-based methods,for example,struggle to handle topic information not reflected in the URL,while content-based methods face limitations due to data sparsity and challenges in capturing semantic relationships.To address this,a semi-supervised website topic classification method,HGNN-SWT,based on a heterogeneous graph neural network,is proposed.This method not only utilizes website text features to complement the limitations of using only URL features but also models sparse relationships between website text and words using a heterogeneous graph,improving classification performance by handling node and edge relationships within the graph.The approach introduces a neighbor node sampling method based on random walks,considering both local features and the global graph structure of nodes.Additionally,a feature fusion strategy is proposed to capture contextual relationships and feature interactions within website text data.Experimental results on a self-created Chinaz Website dataset demonstrate that HGNN-SWT achieves higher accuracy in website topic classification compared to existing methods.

关 键 词:网站主题 异构图神经网络 半监督 特征融合 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象