基于Elmo的高层语义敏感信息识别方法研究  

Study on Sensitive Information Recognition Technology with High-level Se-mantic UnderStanding on Elmo Technology

在线阅读下载全文

作  者:陈紫琴 吴鹏 李乐成 CHEN Ziqin;WU Peng;LI Lecheng(Hubei Provincial Center of National Intenet Emergency Center,Wuhan 430074,China)

机构地区:[1]国家计算机网络应急技术处理协调中心湖北分中心,湖北武汉430074

出  处:《长江信息通信》2024年第7期119-122,共4页Changjiang Information & Communications

摘  要:社交媒体的不断发展使得网络上充斥着大量的信息,然而网络的开放性也导致了容易传播涉政敏感信息。在这样的背景下,如何高效准确地筛查出这些涉政敏感信息成为当前迫切需要解决的问题。本文旨在提出一种基于深度学习的语义层级敏感信息识别方法,以应对这一挑战。将输入的文本以基于Elmo的方法生成动态词向量,建立基于高层语义的细到粗策略和基于多上下文的混合模型,再通过基于关联规则的变体词识别算法得到识别结果。经实验证明,在新浪微博数据集上,文章提出的方法有较好的效果。尤其值得注意的是,基于Elmo的动态词向量生成方法相比传统的word2vec和glove表示法具有更突出的效果,这表明了Elmo技术在敏感信息识别任务中的潜力和优势。The continuous development of social media has led to a proliferation of information online,yet the openness of the internet also facilitates the easy dissemination of harmful sensi-tive content.Against this backdrop,efficiently and accurately screening out such sensitive infor-mation has become an urgent problem.This paper aims to propose a deep learning-based seman-tic hierarchical sensitive information identification method to address this challenge.Input text is transformed into dynamic word vectors based on the Elmo method,and a strategy ranging from fine to coarse based on high-level semantics and a hybrid model based on multiple contexts are established.The identification results are obtained through a variant word recognition algorithm based on association rules.Experimental results demonstrate that the method proposed in this pa-per performs well on the Sina Weibo datasct.Particularly noteworthy is the superior perform-ance of the Elmo-based dynamic word vector generation method compared to traditional word2vec and glove represcntations,indicating the potential and advantages of Elmo technology in sensitive information identification tasks.

关 键 词:文本检测 词向量生成 敏感信息识别 自然语言处理 深度学习 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象