检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄学波 徐正国 燕继坤 HUANG Xuebo;XU Zhengguo;YAN Jikun(State Key Laboratory of Blind Signals Processing,Chengdu 610041,China)
出 处:《计算机工程与应用》2020年第16期199-203,共5页Computer Engineering and Applications
摘 要:在网络协议特征提取问题中,已有的基于频率统计和序列比对等算法在时间效率和准确率上有一定缺陷,因此提出了一种基于Simhash的高频相似序列提取方法。针对传统的Simhash算法一般用于文本处理领域的问题,根据二进制序列的特点将协议数据进行“分词”处理,并采用了减少哈希结果长度、降低比较次数等方法进一步提高算法效率,最终使Simhash适合于高频相似序列提取问题。实验结果表明,该算法的平均覆盖率达到74.28%,并且在此准确率的条件下时间效率较高。In the feature extraction problem of network protocol,the existing algorithms based on frequency statistics and sequence alignment have some shortcomings in time efficiency and accuracy,so a high-frequency similar sequence extraction algorithm based on Simhash is proposed.The traditional Simhash algorithm is generally used in the field of text processing,the protocol data are processed by word segmentation according to the characteristics of binary sequences,and methods such as reducing the length of hash results and the number of comparisons are adopted to further improve the algorithm efficiency.Finally,Simhash is suitable for the extraction of high-frequency similar sequences.Experimental results show that the average coverage rate of the algorithm is 74.28%,and the time efficiency is higher under the condition of such accuracy.
关 键 词:协议分析 二进制序列 Simhash 高频相似序列
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.170