ARCHER:a ReRAM-based accelerator for compressed recommendation systems  

在线阅读下载全文

作  者:Xinyang SHEN Xiaofei LIAO Long ZHENG Yu HUANG Dan CHEN Hai JIN 

机构地区:[1]National Engineering Research Center for Big Data Technology and System,Services Computing Technology and System Lab,Clusters and Grid Computing Lab,School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China

出  处:《Frontiers of Computer Science》2024年第5期147-160,共14页计算机科学前沿(英文版)

基  金:This work was supported by the National Key R&D Program of China(No.2022YFB4501403);the National Natural Science Foundation of China(Grant Nos.62322205,62072195,61825202,and 61832006);the Zhejiang Lab(No.2022PI0AC02).

摘  要:Modern recommendation systems are widely used in modern data centers.The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and memory.ReRAM-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are stored.However,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM profits.Therefore,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss.In this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource constraints.First,we make a full analysis of the computation pattern and access pattern on the decomposed table.Based on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate operations.Based on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource utilization.Under the unified computation and mapping strategy,we can coordinatethe inter-processing elements pipeline.The evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively.

关 键 词:recommendation system RERAM processing-in-memory embedding layer 

分 类 号:TP333[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象