基于ElasticSearch的个人敏感信息检测系统  被引量:7

Sensitive Personal Information Detection System Based on ElasticSearch

在线阅读下载全文

作  者:张雯 盛颖怡 张晓晴 孟升祥 周蓓[1] 沈健[1] ZHANG Wen;SHENG Yingyi;ZHANG Xiaoqing;MENG Shengxiang;ZHOU Bei;SHEN Jian(School of Computer Science and Engineering,Changshu Institute of Technology,Changshu 215500,China)

机构地区:[1]常熟理工学院计算机科学与工程学院,江苏常熟215500

出  处:《常熟理工学院学报》2022年第5期33-36,共4页Journal of Changshu Institute of Technology

摘  要:个人敏感信息泄露是目前多发的网络安全事件之一,可能危及人身和财产安全,损害公民名誉和身体健康等.本文通过爬虫技术获取网页内容及附件,然后提取其正文并通过ElasticSearch实现全文索引和查询,实现了个人敏感信息的检测.以手机号码为例,采用不同分词器和查询方式对查询效率进行测试后得出结论:通过自定义分词器进行全文索引并使用正则表达式查询进行个人敏感信息检测具有最高的效率.The leakage of the sensitive personal information is one of the most frequent types of network security incidents.Once the sensitive personal information is leaked,it may endanger personal and property safety,and it is likely to damage not only personal reputation,but also physical and mental health.This paper obtains the content and attachments of web pages through the web crawler,and realizes full-text indexing and querying through ElasticSearch,thus realizing the detection of the sensitive personal information.By taking the mobile phone number as an example,the paper uses different tokenizers and query methods to test the query efficiency.It is concluded that it is the most efficient way to detect the sensitive personal information by using the self-defined word segmentation and regular expression query.

关 键 词:WEB爬虫 ElasticSearch 个人敏感信息泄露 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象