检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Jiayu ZHANG Jiangrong SHEN Zeke WANG Qinghai GUO Rui YAN Gang PAN Huajin TANG
机构地区:[1]College of Computer Science and Technology,Zhejiang University,Hangzhou 310000,China [2]The State Key Lab of Brain-Machine Intelligence,Zhejiang University,Hangzhou 310000,China [3]College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310000,China [4]MOE Frontier Science Center for Brain Science and Brain-Machine Integration,Zhejiang University,Hangzhou 310000,China [5]Collaborative Innovation Center of Artificial Intelligence,Zhejiang University,Hangzhou 310000,China [6]Advanced Computing and Storage Laboratory,Huawei Technologies Co.,Ltd.,Shenzhen 518000,China
出 处:《Science China(Information Sciences)》2024年第10期111-124,共14页中国科学(信息科学)(英文版)
基 金:supported by National Natural Science Foundation of China(Grant Nos.62236007,62306274,61925603);Huawei and Zhejiang University Brain-Inspired Computing Joint Innovation Research Project(Grant No.FA2019111021-2023-01)。
摘 要:In the era of large-scale pretrained models,artificial neural networks(ANNs)have excelled in natural language understanding(NLU)tasks.However,their success often necessitates substantial computational resources and energy consumption.To address this,we explore the potential of spiking neural networks(SNNs)in NLU——a promising avenue with demonstrated advantages,including reduced power consumption and improved efficiency due to their event-driven characteristics.We propose the SpikingMiniLM,a novel spiking Transformer model tailored for natural language understanding.We first introduce a multi-step encoding method to convert text embeddings into spike trains.Subsequently,we redesign the attention mechanism and residual connections to make our model operate on the pure spike-based paradigm without any normalization technique.To facilitate stable and fast convergence,we propose a general parameter initialization method grounded in the stable firing rate principle.Furthermore,we apply an ANN-to-SNN knowledge distillation to overcome the challenges of pretraining SNNs.Our approach achieves a macro-average score of 75.5 on the dev sets of the GLUE benchmark,retaining 98%of the performance exhibited by the teacher model MiniLMv2.Our smaller model also achieves similar performance to BERT_(MINI)with fewer parameters and much lower energy consumption,underscoring its competitiveness and resource efficiency in NLU tasks.
关 键 词:spiking neural networks natural language understanding spiking Transformer spike-based attention multi-step encoding ANN-to-SNN distillation
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.27