检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:文宝 顾晶晶[1] 刘阳 周强 WEN Bao;GU Jingjing;LIU Yang;ZHOU Qiang(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106
出 处:《小型微型计算机系统》2025年第1期242-248,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金面上项目(62072235)资助。
摘 要:随着片上系统(System On Chip, SOC)集成度和规模的指数级增长,计算机系统发生粒子翻转后产生故障的可能性正在增加,其可靠性已经成为一个越来越值得关注的问题.在众多的故障中,静默数据损坏(Silent Data Corruption, SDC)是最难检测的故障类型之一,其无法被系统纠错机制检测,会随着程序执行无声地传播,最终破坏程序输出.而现有SDC错误检测方法多数仅考虑指令静态特征,忽略了指令间上下文信息,缺乏探索SDC传播规律的能力.为此,本文提出了一种结合程序异构关系图的SDC错误检测方法(SDC Error Detection Based on Program Heterogeneous Relation Graph, PHRG).首先,设计了一个程序分析框架,挖掘程序上下文信息,构建程序异构关系图;其次,利用多关系图注意力网络搭建指令SDC脆弱性预测模型,挖掘SDC传播的关键路径,识别高脆弱性指令;最后,依据预测结果设计容错机制,对程序进行针对性冗余以检测SDC错误.实验结果表明,PHRG在Mibench测试集上较现有方法具有更高的SDC脆弱性预测准确率,更高的SDC检测率和更低的时空开销.With the exponential growth in the integration and size of System On Chip(SOC),the possibility of failures in computer systems following particle flips is increasing,and their reliability has become an issue of growing concern.Among the many faults,Silent Data Corruption(SDC)is one of the most difficult types of faults to detect,which cannot be detected by the system error correction mechanism,and will propagate silently with the program execution,and ultimately corrupt the program output.While most of the existing SDC error detection methods only consider the static characteristics of instructions,ignoring the inter-instruction context information,and lack the ability to explore the SDC propagation rules.To this end,this paper proposes an SDC Error Detection Based on Program Heterogeneous Relation Graph(PHRG).Specifically,we firstly design a program analysis framework to mine program context information and construct a program heterogeneous relation graph.Then,we build an instruction SDC vulnerability prediction model based on Multi-Relational Graph Attention Network to mine the critical paths of SDC propagation and identify the high vulnerability instructions.Finally,we design a fault-tolerance mechanism based on the prediction results and perform targeted redundancy on the program to detect SDC.The experimental results indicate that PHRG has higher SDC vulnerability prediction accuracy,higher SDC detection rate,and lower spatiotemporal overhead compared to existing methods on the Mibench test set.
关 键 词:静默数据损坏 异构关系图 图注意力网络 错误检测
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15