检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯雅妮 蒋林 山蕊 刘阳 张园 FENG Yani;JIANG Lin;SHAN Rui;LIU Yang;ZHANG Yuan(School of Electronic Engineering,Xi'an University of Posts&Telecommunications,Xi'an 710121,China;Laboratory of Integrated Circuit,Xi'an University of Science and Technology,Xi'an 710054,China;School of Computer,Xi'an University of Posts&Telecommunications,Xi'an 710121,China)
机构地区:[1]西安邮电大学电子工程学院,西安710121 [2]西安科技大学集成电路实验室,西安710054 [3]西安邮电大学计算机学院,西安710121
出 处:《计算机科学与探索》2020年第12期2028-2038,共11页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金,Nos.61834005,61772417,61802304,61602377,61634004;陕西省重点研发计划,No.2017GY-060。
摘 要:片上分布式存储结构满足了阵列处理器对访存提出的高并行性要求,一定程度上缓解了“存储墙”问题。但是,在远程访问情况下,分布式存储结构存在的长延迟问题仍然十分突出。针对该问题,设计了一种改进的基于分布式数据Cache的实时动态迁移机制,采用四级全互连和迁移互连,以数据访问频率为依据对远程数据进行动态调度,有效降低了远程访存的延迟。并基于阵列处理器分布式Cache结构,通过运动补偿等典型算法的并行实现,对所提出的实时动态迁移机制进行全面验证测试。实验结果表明,采用实时动态迁移机制的分布式Cache在166.9 MHz的工作频率下,最高可提供10.68 GB/s的访存带宽。与同类结构相比,远程访问延迟降低了46.5%。The on-chip distributed storage structure satisfies the high parallelism requirements of the array processor for memory access,and alleviates the problem of memory wall to some extent.However,in the case of remote access,the long latency problem of distributed storage structure is still very severe.Aiming at this problem,an improved real-time dynamic migration mechanism based on distributed data Cache is designed.It uses four-level fully interconnection and migration interconnection to dynamically schedule remote data based on data access frequency,effectively reducing the delay of remote access.Based on the distributed Cache structure of the array processor,the proposed real-time dynamic migration mechanism is verified by parallel implementation of typical algorithms such as motion compensation.The experimental results show that the distributed Cache with the real-time dynamic migration mechanism can provide data access bandwidth up to 10.68 GB/s at the operating frequency of 166.9 MHz.Compared to similar architectures,remote access latency is reduced by 46.5%.
关 键 词:阵列处理器 分布式Cache 动态迁移 CACHE一致性
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90