检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:魏晓辉[1] 关泽宇 王晨洋 岳恒山 吴旗[1,2] WEI Xiaohui;GUAN Zeyu;WANG Chenyang;YUE Hengshan;WU Qi(School of Computer Science and Technology,Jilin University,Changchun 130012,China;High Performance Computing Center,Jilin University,Changchun 130012,China)
机构地区:[1]吉林大学计算机科学与技术学院,长春130012 [2]吉林大学高性能计算中心,长春130012
出 处:《计算机科学》2025年第5期91-100,共10页Computer Science
基 金:国家重点研发计划(2023YFB4502304);国家自然科学基金(62302190,62272190)。
摘 要:近年来,随着模型推理精度的不断提高,卷积神经网络(CNN)在安全关键领域得到了广泛应用。为了满足CNN在实时性、高性能和低功耗计算方面的需求,领域专用架构的CNN加速器应运而生。其中,脉动阵列架构凭借其结构简单和高并行度等优势被广泛应用。然而,由于制程变异和设备老化等因素的影响,脉动阵列容易发生Stuck-At故障(SAF),进而可能导致灾难性事故。因此,制定针对脉动阵列的容错策略显得尤为重要。然而,现有的容错策略存在时间和资源开销大、网络参数修改过多等问题。为实现高效且低开销的轻量级容错策略,拟挖掘CNN的固有容错能力,对部分影响较小的SAF进行松弛处理,以减少整体容错开销。同时,充分考虑脉动阵列的计算特性,提出了行(列)交换和权重拆分两种软硬件协同容错设计,有效缓解SAF对模型推理精度的影响。实验结果表明,相较于传统行(列)跳过策略和选择保护策略,所提软硬件协同容错策略在执行效率和模型精度恢复方面更具优势。In recent years,with the continuous improvement in model inference accuracy,convolutional neural networks(CNNs)have been widely applied in safety-critical fields.To meet the demands of CNNs for real-time,high-performance,and low-power computing,domain-specific CNN accelerators is proposed.Among these,systolic array architectures have been extensively used due to their simple structure and high parallelism.However,factors such as process variations and equipment aging make systolic arrays prone to Stuck-At faults(SAF),which can lead to catastrophic accidents.Therefore,fault-tolerant strategies for systolic arrays are critically important.Existing fault-tolerant strategies,however,suffer from high time and resource costs,as well as excessive modifications to network parameters.To achieve an efficient and low-overhead lightweight fault-tolerant strategy,this paper aims to exploit the inherent fault tolerance of CNNs by relaxing the handling of minor SAFs,thereby reducing overall fault-tolerance overhead.Additionally,by fully considering the computational characteristics of systolic arrays,this paper proposes two hardware-software co-design fault-tolerant strategies:row(column)swapping and weight splitting.These strategies effectively mitigate the impact of SAF on model inference accuracy.Experimental results show that,compared to traditional row(column)bypass and selective protection strategies,the proposed hardware-software co-design fault-tolerant strategies offer superior execution efficiency and model accuracy recovery.
关 键 词:卷积神经网络 容错设计 Stuck-At故障 脉动阵列 卷积神经网络加速器
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7