多线程程序数据竞争随机森林指令级检测模型  被引量:4

Random forest instruction level detection model for data race in multithreaded programs

在线阅读下载全文

作  者:孙家泽 阳伽伟[1] 杨子江 SUN Jiaze;YANG Jiawei;YANG Zijiang(School of Computer Science and Technology,Xi’an University of Posts and Telecommunications,Xi'an 710121,China;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing,Xi’an University of Posts and Telecommunications,Xi'an 710121,China;Department of Computer Science,Western Michigan University,Kalamazoo 49008-5466,USA)

机构地区:[1]西安邮电大学计算机学院,中国西安710121 [2]西安邮电大学陕西省网络数据分析与智能处理重点实验室,中国西安710121 [3]西密歇根大学计算机系,美国卡拉马祖49008-5466

出  处:《清华大学学报(自然科学版)》2020年第10期804-813,共10页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金面上项目(61876138);陕西省工业项目(2018GY-014);西安邮电大学研究生创新基金项目(CXJJLA2018003);陕西省普通高等学校重点学科专项资金建设项目。

摘  要:数据竞争是典型的多线程程序并发缺陷。由于多线程程序中存在不确定性的交织,数据竞争很难被检测出来。该文以多线程数据竞争的5个相关属性作为特征,构建了多线程程序数据竞争随机森林指令级检测模型。首先基于happens-before关系与lockset算法指令级检测数据竞争,同时用汇编源码信息来剔除隐形同步对,然后利用happens-before关系与lockset算法的分析结果训练多线程程序数据竞争随机森林检测模型。在Pin上实现了多线程程序数据竞争检测工具AIRaceTest。利用GitHub中多线程程序的插桩结果作为样本集来训练随机森林模型,模型精度可达92.1%。对Google data-race-test、Parsec基准程序3.1中的经典多线程程序的检测结果表明:AIRaceTest与Eraser、Djit+以及Thread Sanitizer这3种目前常用的数据竞争检测工具相比,数据竞争的误报和漏报分别降低了约10.6%和12.3%,在线程数较多的情况下,时间和内存开销分别降低了41.8%和22.4%。Data race is a typical concurrency bug in multithreaded programs.Data race is difficult to detect due to the uncertain interleaving in multithreaded programs.A random forest instruction level data race detection model is developed for multithread programs using five attributes to identify the data race features.Firstly,data race detection at the instruction level is based on the happens-before relationship and the lockset algorithm.At the same time,the assembly source code is used to eliminate implicit synchronization pairs.Then,the analysis results from the happens-before relationship and the lockset algorithm are used to train a random forest detection model for multithreaded program data race detection.This data race detection tool for multithreaded programs,AIRaceTest,is implemented on Pin.The model is trained with the results of the multithreaded program instrumentation in GitHub as a sample set.The model accuracy reaches 92.1%.Test results on the classic multithreaded programs,Google data-race-test and Parsec benchmark 3.1,show that the false positives are reduced by about 10.6%and the false negatives are reduced by about 12.3%compared with Eraser,Djit+and Thread Sanitizer.For a large number of threads,the time overhead is reduced by 41.8%while the memory overhead is reduced by 22.4%.

关 键 词:数据竞争 并发缺陷 随机森林 隐形同步对 

分 类 号:TP306[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象