检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张陶 廖彬[3] 于炯 李敏[2,4] 孙瑞娜[4] ZHANG Tao;LIAO Bin;YU Jiong;LI Ming;SUN Ruina(College of Information Engineering,Guizhou University of Traditional Chinese Medicine,Guiyang 550025,China;School of Information Science and Engineering,Xinjiang University,Urumqi 830008,China;College of Big Data Statistics,Guizhou University of Finance and Economics,Guiyang 550025,China;College of Statistics and Information,Xinjiang University of Finance and Economics,Urumqi 830012,China)
机构地区:[1]贵州中医药大学信息工程学院,贵阳550025 [2]新疆大学信息科学与工程学院,乌鲁木齐830008 [3]贵州财经大学大数据统计学院,贵阳550025 [4]新疆财经大学统计与信息学院,乌鲁木齐830012
出 处:《计算机科学》2024年第4期132-150,共19页Computer Science
基 金:国家自然科学基金(61562078);新疆天山青年计划项目(2018Q073)。
摘 要:图神经网络(Graph Neural Network,GNN)模型由于采用端到端的模型架构,在训练过程中能够更好地将节点隐藏特征的学习和分类目标协同起来,相比图嵌入(Graph Embedding)的方法,其在节点分类等任务上得到了较大的性能提升。但是,已有图神经网络模型实验对比阶段普遍存在的数据集类型单一、样本量不足、数据集切分不规范、对比模型规模及范围有限、评价指标单一、缺乏模型训练耗时对比等问题。为此,文中选取了包括cora,citeseer,pubmed,deezer等在内的来自不同领域(引文网络、社交网络及协作网络等)的共计20种数据集,以准确率、精确率、召回率、F-score值及模型训练耗时为多维评价指标,在FastGCN,PPNP,ChebyNet,DAGNN等17种主流图神经网络模型上,进行了全面且公平的节点分类任务基准测评,进而为真实业务场景下的模型选择提供了决策参考。通过基准测试实验发现,一方面,影响模型训练速度的因素排名依次是节点属性维度、图节点规模及图边的规模;另一方面,并不存在赢者通吃的模型,即不存在在所有数据集下全都表现优异的模型,特别是在公平的基准测试配置环境下,结构简洁的模型反而比复杂的GNN模型有着更好的性能表现。In contrast with previous graph embedding algorithms,the graph neural network model performs tasks such as node classification more effectively because it can better coordinate the learning of hidden node features with the classification target due to its end-to-end model architecture in the training process.However,the experimental comparison stage of existing graph neural models frequently suffers from problems such as specific types of experimental datasets,insufficient dataset sample size,irregular splitting of the train and test sets,limited scale and scope of comparison models,homogeneous performance evaluation metrics,and lack of comparative analysis for model’s training time consumption.To this end,in order to provide decision guidelines for GNN model selection in real business scenarios,a total of 20 datasets from various domains(citation networks,social networks,collaboration networks,etc.),including cora,citeseer,pubmed,deezer,etc.,are chosen to conduct a comprehensive and equitable benchmark evaluation of node classification tasks on 17 mainstream graph neural network models,including FastGCN,PPNP,ChebyNet,DAGNN,etc.,on performance evaluation metrics including accuracy,precision,recall,F-score value,and model training time.The benchmarking experiments revealed that,on the one hand,the factors that affect the speed of model training are node attribute dimension,graph node size and graph edge size in turn;on the other hand,there is no winner-take-all model,that is,there is no model that performs well across all benchmark datasets,especially in a fair benchmarking configuration,the model with simple structure has better performance than the complex GNN models.
关 键 词:图神经网络 基准测试 节点分类 性能评估 模型选择
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.118