Hardware-oriented algorithms for softmax and layer normalization of large language models  

在线阅读下载全文

作  者:Wenjie LI Dongxu LYU Gang WANG Aokun HU Ningyi XU Guanghui HE 

机构地区:[1]School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200241,China [2]Department of Micro/Nano Electronics,Shanghai Jiao Tong University,Shanghai 200241,China [3]MoE Key Laboratory of Artificial Intelligence,Shanghai Jiao Tong University,Shanghai 200241,China

出  处:《Science China(Information Sciences)》2024年第10期81-95,共15页中国科学(信息科学)(英文版)

基  金:supported by National Natural Science Foundation of China (Grant No. 62074097)。

摘  要:While large language models(LLMs) have sparked a new revolution in the field of natural language processing(NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked.This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore,hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.

关 键 词:large language model softmax layer normalization hardware architecture Transformer 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP391.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象