一种工控协议识别中的特征字符串挖掘算法  

A Feature String Mining Algorithm in Industrial Control Protocols Recognition

在线阅读下载全文

作  者:海洋[1] 徐魁 李晓辉 曾涛 陶军 HAI Yang;XU Kui;LI Xiao-hui;ZENG Tao;TAO Jun(Communications Department of Baoji Public Security Bureau,Baoji 721014,China;Baoji Chuangtianqinghang Technology Development Co.,Ltd.,Baoji 721000,China;School of Cyber Science and Engineering,Southeast University,Nanjing 210096,China)

机构地区:[1]宝鸡市公安局通信处,陕西宝鸡721014 [2]宝鸡创天清航科技发展有限责任公司,陕西宝鸡721000 [3]东南大学网络空间安全学院,江苏南京210096

出  处:《计算机技术与发展》2024年第1期200-205,共6页Computer Technology and Development

基  金:中国高校产学研创新基金-阿里云高校数字化创新专项(2021ALA03006)。

摘  要:对工控协议的识别,是对工控协议开展研究的第一步。而在通信过程中频繁出现的字符串,是对工控协议识别中的重要特征。针对工控协议识别中特征字符串的提取问题,提出了一种自顶向下的频繁字符串挖掘算法,可以直接得到没有冗余的频繁字符串集。同时,对于自顶向下方法中原始数据过于庞大、算法迭代次数较多等问题,借鉴了N-gram模型,提出了一种数据划分策略,解决了自顶向下处理时数据过大的问题。此外,在挖掘频繁字符串的过程中,采取了删除重叠项与字符串分裂相结合的方法。实验结果表明,该算法针对多种协议均能识别出其中的特征字符串;同时,利用识别出的字符串作为特征,在协议识别工作中也能取得良好的效果。可以得出结论,该算法能够较好地提取出工控协议中的特征字符串。The identification of industrial control protocols is the first step in research on industrial control protocols.In the communication process,frequently occurring strings are important features for identifying industrial control protocols.We propose a top-down frequent string mining algorithm that can directly obtain a non-redundant set of frequent strings for feature extraction in industrial control protocols identification.Additionally,to address the issue of large original data and numerous algorithm iterations in the top-down method,we borrow from the N-gram model and propose a data partitioning strategy to solve the problem of processing large data in the top-down approach.Furthermore,to mine frequent strings,we adopt a combination of deleting overlapping items and string splitting.Experimental results show that the proposed algorithm can identify feature strings in multiple protocols and achieve good results in protocol identification by using identified strings as features.It can be concluded that the proposed algorithm can effectively extract feature strings from industrial control protocols.

关 键 词:频繁字符串 自顶向下 数据划分 特征提取 数据处理 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象