互联网新闻敏感信息识别方法的研究  被引量:11

Research on Sensitive Information Recognition of Internet News

在线阅读下载全文

作  者:李姝 张祥祥 于碧辉 于金刚[3] LI Shu;ZHANG Xiang-xiang;YU Bi-hui;YU Jin-gang(School of Institute of Equipment Engineering,Shenyang Ligong University,Shenyang 110159,China;Chinese Academy of Sciences University,Beijing 100049,China;Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China)

机构地区:[1]沈阳理工大学装备工程学院,沈阳110159 [2]中国科学院大学,北京100049 [3]中国科学院沈阳计算技术研究所,沈阳110168

出  处:《小型微型计算机系统》2021年第4期685-689,共5页Journal of Chinese Computer Systems

基  金:国家重点研发计划项目(2019YFB1405804)资助。

摘  要:敏感信息识别是净化互联网环境的关键,在当今信息爆炸的时代,人们每天都要从互联网中获得大量信息,如何过滤大量信息中的敏感信息对整个社会安定和谐有着重要的意义.现有的方法主要是基于敏感关键词的方法进行过滤,需要不断更新迭代敏感关键词,泛化性弱,本文中使用基于预训练模型的深度学习方法可以学习到互联网新闻文本中更深层的语义信息,进而更有效的识别和过滤敏感信息,泛化性强,但是只使用深度学习方法会一定程度上的损失敏感关键词特征.本文首次将传统的敏感关键词方法与深度学习方法相结合应用于互联网敏感信息识别,提出了一种融合敏感关键词特征的模型Mer-HiBert.实验结果表明,与之前的敏感关键词方法以及深度学习模型相比,模型的性能有进一步提高.Sensitive information identification is the key to purifying the Internet environment.In the era of information explosion,people have to obtain a large amount of information from the Internet every day.How to filter the sensitive information in the large amount of information is of great significance to the stability and harmony of the entire society.The existing methods are mainly based on the method of sensitive keywords for filtering,and need to continuously update and iterate sensitive keywords,and the generalization is weak.However,the method of deep learning based on the pre-training model introduced in this article can learn more from the Internet news text.Deep semantic information,and then more effectively identify and filter sensitive information,with strong generalization,but only using deep learning methods will lose sensitive keyword features to a certain extent.For the first time,this paper combines traditional sensitive keyword methods with deep learning methods to identify sensitive information on the Internet,and proposes a model Mer-Hi-Bert that integrates sensitive keyword features.The experimental results show that compared with the previous sensitive keyword method and deep learning model,the performance of the model is further improved.

关 键 词:敏感信息识别 敏感关键词 Bert ATTENTION TextCNN 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象