检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Xinyang SHEN Xiaofei LIAO Long ZHENG Yu HUANG Dan CHEN Hai JIN
出 处:《Frontiers of Computer Science》2024年第5期147-160,共14页计算机科学前沿(英文版)
基 金:This work was supported by the National Key R&D Program of China(No.2022YFB4501403);the National Natural Science Foundation of China(Grant Nos.62322205,62072195,61825202,and 61832006);the Zhejiang Lab(No.2022PI0AC02).
摘 要:Modern recommendation systems are widely used in modern data centers.The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and memory.ReRAM-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are stored.However,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM profits.Therefore,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss.In this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource constraints.First,we make a full analysis of the computation pattern and access pattern on the decomposed table.Based on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate operations.Based on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource utilization.Under the unified computation and mapping strategy,we can coordinatethe inter-processing elements pipeline.The evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively.
关 键 词:recommendation system RERAM processing-in-memory embedding layer
分 类 号:TP333[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7