结合程序异构关系图的SDC错误检测  

SDC Error Detection Based on Program Heterogeneous Relation Graph

在线阅读下载全文

作  者:文宝 顾晶晶[1] 刘阳 周强 WEN Bao;GU Jingjing;LIU Yang;ZHOU Qiang(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)

机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106

出  处:《小型微型计算机系统》2025年第1期242-248,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金面上项目(62072235)资助。

摘  要:随着片上系统(System On Chip, SOC)集成度和规模的指数级增长,计算机系统发生粒子翻转后产生故障的可能性正在增加,其可靠性已经成为一个越来越值得关注的问题.在众多的故障中,静默数据损坏(Silent Data Corruption, SDC)是最难检测的故障类型之一,其无法被系统纠错机制检测,会随着程序执行无声地传播,最终破坏程序输出.而现有SDC错误检测方法多数仅考虑指令静态特征,忽略了指令间上下文信息,缺乏探索SDC传播规律的能力.为此,本文提出了一种结合程序异构关系图的SDC错误检测方法(SDC Error Detection Based on Program Heterogeneous Relation Graph, PHRG).首先,设计了一个程序分析框架,挖掘程序上下文信息,构建程序异构关系图;其次,利用多关系图注意力网络搭建指令SDC脆弱性预测模型,挖掘SDC传播的关键路径,识别高脆弱性指令;最后,依据预测结果设计容错机制,对程序进行针对性冗余以检测SDC错误.实验结果表明,PHRG在Mibench测试集上较现有方法具有更高的SDC脆弱性预测准确率,更高的SDC检测率和更低的时空开销.With the exponential growth in the integration and size of System On Chip(SOC),the possibility of failures in computer systems following particle flips is increasing,and their reliability has become an issue of growing concern.Among the many faults,Silent Data Corruption(SDC)is one of the most difficult types of faults to detect,which cannot be detected by the system error correction mechanism,and will propagate silently with the program execution,and ultimately corrupt the program output.While most of the existing SDC error detection methods only consider the static characteristics of instructions,ignoring the inter-instruction context information,and lack the ability to explore the SDC propagation rules.To this end,this paper proposes an SDC Error Detection Based on Program Heterogeneous Relation Graph(PHRG).Specifically,we firstly design a program analysis framework to mine program context information and construct a program heterogeneous relation graph.Then,we build an instruction SDC vulnerability prediction model based on Multi-Relational Graph Attention Network to mine the critical paths of SDC propagation and identify the high vulnerability instructions.Finally,we design a fault-tolerance mechanism based on the prediction results and perform targeted redundancy on the program to detect SDC.The experimental results indicate that PHRG has higher SDC vulnerability prediction accuracy,higher SDC detection rate,and lower spatiotemporal overhead compared to existing methods on the Mibench test set.

关 键 词:静默数据损坏 异构关系图 图注意力网络 错误检测 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象