检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:余坚 王俊峰 陈熳熳 方智阳 YU Jian;WANG Jun-Feng;CHEN Man-Man;FANG Zhi-Yang(College of Computer Science(College of Software),Sichuan University,Chengdu 610065,China;School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China)
机构地区:[1]四川大学计算机学院(软件学院),成都610065 [2]四川大学网络空间安全学院,成都610065
出 处:《四川大学学报(自然科学版)》2024年第4期14-26,共13页Journal of Sichuan University(Natural Science Edition)
基 金:国家自然科学基金(U2133208);国家重点研发计划(2022YFB3305200);四川大学-泸州市人民政府战略合作项目(2022CDLZ-5)。
摘 要:为了应对日益严峻的网络威胁,需要对网络攻击做深入的分析.网络威胁指标(IOC)是网络威胁情报(CTI)的重要组成部分,贯穿了网络攻击整个生命周期,准确描述了每个攻击阶段的关键信息(攻击行为、威胁体等).从CTI中抽取IOC可以帮助进行网络防御、追踪和对抗.现有的IOC抽取方法基于机器学习或深度学习方法取得了巨大进展,但是需要大量人工标注的CTI进行训练.为了应对这一挑战,本文提出了一种新颖的IOC自动提取方法(L-AIE),仅使用少量标注的CTI就能达到优秀的提取准确率. L-AIE通过细粒度的分词方式以从较少的CTI中获得足够的信息,上下文层和组合层用于充分提取子词级别的上下文信息.在训练阶段,L-AIE利用额外的关系层来扩大IOC类别之间的差异.实验证明,L-AIE对训练数据量的依赖较小,而且提取效果也优于其他对比方法 . L-AIE仅使用其他模型10%的数据训练,就达到了87.54%Macro F1值,比其他方法高出20%.当训练数据量进一步减少时,L-AIE受影响的程度也小于其他模型的一半.To address the increasingly challenging cyber threats,there is an urgent need to analyze cyber threats to gain advantage in cyberspace operations.Indicator of Compromise(IOC),an essential part of Cyber Threat Intelligence(CTI),is throughout the entire cyber attack lifecycle and describes key information(attack behaviors,entities,etc.)accurately at each attack stage.Extracting IOCs from CTI can assist cyber defence,trace and countermeasure.Existing IOC extraction methods have made great progress with machine learning or deep learning,but they require massive investment to label adequate CTI for training and are not as effective in scenarios with limited labeled CTI.To tackle this challenge,Automatical IOC Extraction based on Less labeled data(L-AIE),a novel IOC extraction method,is proposed to reduce the labeling cost while ensuring the extraction accuracy.L-AIE enhances the CTI text processing by fine-grained word tokenization to obtain enough information from less CTI.Context and Combination Layer are used to extract sufficient context of IOC entities which are split into subwords.Furthermore,in the training stage,L-AIE has an additional Relation Layer to expand the differences between IOC categories.Extensive experiments demonstrates that L-AIE not only has less dependence on the amount of labeled data but also outperforms other outstanding methods.With only approximately 10%of the training data of previous experiments,L-AIE achieves a macro F1 score of 87.54%,more than 20%higher than other methods.When the amount of training data is further reduced,the L-AIE extraction result is affected to less than half the extent of the other models.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.67.226