DUWe:动态未知词嵌入方法在Web异常检测中的应用  

DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection

在线阅读下载全文

作  者:王丽[1,2,3] 陈刚 夏明山[1,2] 胡皓 WANG Li;CHEN Gang;XIA Mingshan;HU Hao(Institute of High Energy Physics,Chinese Academy of Sciences(CAS),Beijing 100049,China;Spallation Neutron Source Science Center(SNSSC),Dongguan,Guangzhou 523803,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院高能物理研究所,北京100049 [2]散裂中子源科学中心,广东东莞523803 [3]中国科学院大学,北京100049

出  处:《计算机科学》2024年第S01期914-918,共5页Computer Science

基  金:国家自然科学基金(11905239,12005248,12105303)。

摘  要:现有的基于深度学习模型的词嵌入方法用于Web异常检测时,通常将语料库中没有出现的未知词汇(Out of Vocabulary,OOV)设置为unknown,并赋予零或随机向量输入到模型中进行训练,未考虑未知词汇在Web请求语句中的上下文关系。同时,在Web系统代码开发过程中,基于个人习惯并为了增加代码的可读性,程序员设计的请求路径代码往往存在一定的模式。因此,考虑到Web请求的模式和单词语义间的相关性,研究基于Word2vec的动态未知词表示方法DUWe(Dynamic Unknown Word Embedding),该方法通过分析Web请求路径中单词上下文的关系来赋予未知词向量的表示内容。在CSIC-2010和WAF Dataset数据集上的实验评估表明,增加未知词表示方法比仅用Word2vec静态特征提取方法具有更好的性能,同时在准确性、精准率、召回率和F1-Score方面均有提高,在训练时间上最大降低1.14倍。When the existing deep-learning model-based word embedding methods are used to detect Web anomalies,the vocabulary not appearing in the corpus is usually called out of vocabulary(OOV)and is set as unknown,and given zero or random vector as the input of the depth model for training without considering the context of unknown word in the web request.In the process of code development,in order to increase the readability of code,programmers often design request path code based on a certain pattern which usually makes web requests semantically related.Considering that there are certain request patterns in web requests and pattern correlation between semantics,this paper studies and proposes a dynamic unknown word embedding method DUWe based on Word2vec,which assigns unknown word representation through word context inference.Evaluation on CSIC-2010 and WAF dataset shows that adding unknown word embedding methods have better performance than word2vec feature extraction methods.The accuracy,precision,recall rate and F1-Score are improved,and the maximum reduction in training time is 1.14 times.

关 键 词:未知词汇 Web异常检测 动态词嵌入 词嵌入优化 深度学习 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象