检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Wenjie LI Dongxu LYU Gang WANG Aokun HU Ningyi XU Guanghui HE
机构地区:[1]School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200241,China [2]Department of Micro/Nano Electronics,Shanghai Jiao Tong University,Shanghai 200241,China [3]MoE Key Laboratory of Artificial Intelligence,Shanghai Jiao Tong University,Shanghai 200241,China
出 处:《Science China(Information Sciences)》2024年第10期81-95,共15页中国科学(信息科学)(英文版)
基 金:supported by National Natural Science Foundation of China (Grant No. 62074097)。
摘 要:While large language models(LLMs) have sparked a new revolution in the field of natural language processing(NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked.This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore,hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.
关 键 词:large language model softmax layer normalization hardware architecture Transformer
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7