面向监听一致性协议的并发内存竞争记录算法  

A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence

在线阅读下载全文

作  者:朱素霞[1,2] 陈德运[2] 季振洲[3] 孙广路[2] 张浩[4] 

机构地区:[1]哈尔滨理工大学计算机科学与技术学院博士后流动站,哈尔滨150080 [2]哈尔滨理工大学计算机科学与技术学院,哈尔滨150080 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [4]中国科学院计算技术研究所,北京100190

出  处:《计算机研究与发展》2016年第6期1238-1248,共11页Journal of Computer Research and Development

基  金:国家自然科学青年基金项目(61502123);国家自然科学基金项目(61173024);国家"九七三"重点基础研究发展计划基金项目(2011CB302501);黑龙江省青年科学基金项目(QC2015084);中国博士后科学基金项目(2015M571429)~~

摘  要:内存竞争记录是解决多核程序执行不确定性的关键技术,然而现有点到点的内存竞争记录机制带来的硬件开销大,难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点,为采用监听一致性协议的片上多核处理器(chip multiprocessor,CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程,采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志;重演时采用简化的生产者消费者模型,确保了确定性重演的实现,有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明,该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B,每千条内存操作指令记录日志大小约2.3B,记录和重演阶段均添加不到1.5%的带宽开销.Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs.Because of high hardware overhead,the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors.In order to reduce the hardware overhead of point-to-point logging approach,a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed.This algorithm records the current execution points of all threads concurrently when detecting a memory conflict.It extends the point-topoint memory race relationship between two threads to all threads in recording phase,reducing hardware overhead significantly.It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log.When replaying,this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay,improving replay speed and reducing coherence bandwidth overhead.The simulation results on 8-core chip multiprocessor(CMP)system show that this concurrent recording algorithm based on point-to-point logging approach adds about171 Bhardware for each processor,and records about 2.3Blog per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.

关 键 词:片上多核处理器 多核程序 确定性重演 内存竞争记录 内存冲突检测 监听一致性协议 

分 类 号:TP303[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象