SpikingMiniLM:energy-efficient spiking transformer for natural language understanding  

在线阅读下载全文

作  者:Jiayu ZHANG Jiangrong SHEN Zeke WANG Qinghai GUO Rui YAN Gang PAN Huajin TANG 

机构地区:[1]College of Computer Science and Technology,Zhejiang University,Hangzhou 310000,China [2]The State Key Lab of Brain-Machine Intelligence,Zhejiang University,Hangzhou 310000,China [3]College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310000,China [4]MOE Frontier Science Center for Brain Science and Brain-Machine Integration,Zhejiang University,Hangzhou 310000,China [5]Collaborative Innovation Center of Artificial Intelligence,Zhejiang University,Hangzhou 310000,China [6]Advanced Computing and Storage Laboratory,Huawei Technologies Co.,Ltd.,Shenzhen 518000,China

出  处:《Science China(Information Sciences)》2024年第10期111-124,共14页中国科学(信息科学)(英文版)

基  金:supported by National Natural Science Foundation of China(Grant Nos.62236007,62306274,61925603);Huawei and Zhejiang University Brain-Inspired Computing Joint Innovation Research Project(Grant No.FA2019111021-2023-01)。

摘  要:In the era of large-scale pretrained models,artificial neural networks(ANNs)have excelled in natural language understanding(NLU)tasks.However,their success often necessitates substantial computational resources and energy consumption.To address this,we explore the potential of spiking neural networks(SNNs)in NLU——a promising avenue with demonstrated advantages,including reduced power consumption and improved efficiency due to their event-driven characteristics.We propose the SpikingMiniLM,a novel spiking Transformer model tailored for natural language understanding.We first introduce a multi-step encoding method to convert text embeddings into spike trains.Subsequently,we redesign the attention mechanism and residual connections to make our model operate on the pure spike-based paradigm without any normalization technique.To facilitate stable and fast convergence,we propose a general parameter initialization method grounded in the stable firing rate principle.Furthermore,we apply an ANN-to-SNN knowledge distillation to overcome the challenges of pretraining SNNs.Our approach achieves a macro-average score of 75.5 on the dev sets of the GLUE benchmark,retaining 98%of the performance exhibited by the teacher model MiniLMv2.Our smaller model also achieves similar performance to BERT_(MINI)with fewer parameters and much lower energy consumption,underscoring its competitiveness and resource efficiency in NLU tasks.

关 键 词:spiking neural networks natural language understanding spiking Transformer spike-based attention multi-step encoding ANN-to-SNN distillation 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP391.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象