检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谈心 杨悉瑜 曹家俊 张源 TAN Xin;YANG Xi-Yu;CAO Jia-Jun;ZHANG Yuan(School of Computer Science,Fudan University,Shanghai 201203,China)
机构地区:[1]复旦大学计算机科学技术学院,上海201203
出 处:《软件学报》2022年第6期2030-2046,共17页Journal of Software
基 金:国家自然科学基金(U1836210,U1836213,U1736208,61972099,62172105);上海自然科学基金(19ZR1404800);上海市青年科技启明星计划(21QA1400700)。
摘 要:引用计数机制是现代软件中一种常见的内存管理技术.引用计数错误往往会导致内存泄露、释放后使用(useafterfree)等严重的安全问题.现有致力于提高引用计数安全性的工作都依赖于对引用计数的字段进行识别.然而,由于类似于Linux等软件系统的代码十分复杂,在代码中识别出引用计数字段是一项十分困难的工作.传统的基于代码模式匹配的引用计数字段识别方法一方面存在需要专家经验总结规则,人工开销大的问题;另一方面存在总结的模式无法覆盖所有情况,召回率较低等局限.针对这些问题,发现与字段有关的代码行为以及字段的名称可以用来表征这个字段的特征,帮助识别引用计数字段.基于这两个层面的特征,设计了一种基于多模态深度学习的引用计数字段识别方法,并面向Linux内核实现原型系统.测试数据表明:该原型系统的精确率、召回率分别为96.98%和93.54%,而传统的基于代码模式匹配的方法没有识别出任何引用计数字段.此外,在Linux内核上发现61个引用计数字段使用不安全的数据类型,并对其中21个向Linux内核社区提交数据类型转换补丁以提高引用计数字段的安全性,其中6个已经被合并到Linux内核代码主分支.Reference counting(refcount) is a common memory management technique in modern software. Refcount errors can often lead to severe memory errors such as memory leak, use-after-free, etc. Many efforts to harden refcount security rely on known refcount fields as their input. However, due to the complexity of software code, identifying refcount fields in source code is very challenging. Traditional methods of identifying refcount fields are mainly based on code pattern matching and have great limitations such as requiring expert experience to summarize patterns, which is a laborious job. Besides, the manually-summarized patterns do not cover all cases, resulting in a low recall. To address these issues, this studyproposes to characterize a field based on the field name and the code behaviour associated with the field;and designs a multimodal deep learning based approach. The study implements a prototype of the new approach for Linux kernel code. In the evaluation, the precision and recall achieved by the prototype system are 96.98% and 93.54%. In contrast, the traditional code-pattern-based identification method did not report any refcount fields on the testing set. In addition, sixty-one refcount fields are identified which are implemented with insecure data types in the latest Linux kernel. Until now, twenty-one of them are reported to the Linux community, of which six have been confirmed.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43