机构地区:[1]西北工业大学计算机学院,西安710129 [2]嵌入式系统集成教育部工程中心,西安710129 [3]空天地海一体化大数据应用技术国家工程实验室,西安710129 [4]西安交通大学人工智能与机器人研究所,西安710049
出 处:《计算机学报》2024年第7期1469-1484,共16页Chinese Journal of Computers
基 金:国家自然科学基金“面向边缘智能的随机近存计算专用体系结构研究”(No.62272393);“无人系统自主定位及环境建模的计算优化与软硬件协同设计”(No.62076193)资助.
摘 要:神经网络架构搜索(Neural Architecture Search,NAS)作为一种通过搜索算法设计神经网络架构的方法,在计算机视觉和自然语言处理等领域得到广泛应用,相较于人工设计网络,NAS方法可以减少设计成本并提高模型性能.但是NAS的性能评估需要对候选架构进行大量训练,由此带来的计算量占整个NAS的80%以上.为降低计算开销和时间成本,近年来已提出许多基于Transformer的NAS预测器,由于Transformer出色的结构编码能力可以更好地表示拓扑信息,因而得到广泛应用.但是,现有基于Transformer的NAS预测器依然存在三个问题:其一是在预处理阶段,传统的One-hot编码方式描述节点特征的能力较弱,只能区分不同操作节点类型,而难以表达操作的细节特征,如卷积核尺寸等.其二是在编码阶段,Transformer的自注意力机制导致模型结构信息缺失;其三是在评估阶段,现有的Transformer预测器仅使用多层感知机(Multilayer Perceptron,MLP)对前向传播图进行精度预测,忽略了反向传播梯度流对预测精度的影响,因此难以真正拟合NAS评估中的正、反向交替信息流图,导致预测器精度与实际运行精度误差波动极大(10%~90%).为解决上述问题,本文提出了一种基于Transformer结构增强的NAS性能预测方法.首先,在预处理阶段,本文提出了一种超维嵌入方法增加输入数据维度以强化节点操作的参数描述能力,其次,在编码阶段将Transformer编码后的信息与图结构信息共同输人一个图卷积网络(Graph Convolutional Network,GCN),弥补由自注意力机制引起的结构缺失.最后,在性能评估阶段,本文构建了同时包含前向传播和反向传播的全训练图,并将数据集信息、图结构编码与梯度编码共同输入到GCN网络预测器中,使预测结果更贴近模型真实性能.实验结果表明,本方法与目前最先进方法相比,肯德尔相关系数提高了7.45%,训练时间减少了1.55倍。Neural Architecture Search(NAS)represents a paradigm shift in the design of neural network architectures,transitioning from manual,intuition-based processes to automated,algorithm-driven methods.This evolution has significantly reduced design costs and improved model performance across various tasks in computer vision,natural language processing,and beyond.Despite these advances,the computational expense associated with evaluating the performance of candidate architectures remains a formidable challenge,consuming the lion's share of resources in NAS endeavors.The emergence of NAS predictors based on the Transformer architecture has offered a promising pathway to mitigating these computational demands.The Transformer's superior capability for encoding topological information of neural networks has made it an attractive foundation for developing predictors that can efficiently approximate the performance of vast numbers of architectures without the need for exhaustive training.However,the deployment of Transformer-based NAS predictors has encountered significant obstacles.The initial challenge lies in the pretreatment phase,where traditional encoding schemes like One-hot encoding fall short in capturing the full spectrum of node features within a network architecture.Such schemes can identify different types of operations but lack the granularity to describe the operational parameters with sufficient detail,such as the sizes of convolutional kernels,which are critical for accurately modeling network behavior.During the encoding stage,the reliance on the Transformer's self-attention mechanism introduces a second challenge by potentially overlooking essential structural information.The self-attention mechanism,while powerful for capturing global dependencies,may not fully preserve the hierarchical and spatial relationships inherent in neural network architectures,which are crucial for understanding their performance.The evaluation phase presents a third hurdle,where existing predictors mainly utilize Multilayer Perce
关 键 词:预测器 NAS TRANSFORMER GCN EMBEDDING
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...