强化语义一致性的差分隐私文本脱敏方法  

A Differential Privacy Text Desensitization Method for Enhancing Semantic Consistency

在线阅读下载全文

作  者:关业礼 罗森林[1] 潘丽敏[1] 张笈[1] 于经纬 Guan Yeli;Luo Senlin;Pan Limin;Zhang Ji;Yu Jingwei(School of Information and Electronics,Beijing Institute of Technology,Beijing 100081;North Institute for Scientific and Technical Information,Beijing 100089)

机构地区:[1]北京理工大学信息与电子学院,北京100081 [2]北方科技信息研究所,北京100089

出  处:《信息安全研究》2024年第8期706-711,共6页Journal of Information Security Research

基  金:国家重点研发计划项目(2018YFC2000300)。

摘  要:文本脱敏是一种极为重要的隐私保护方法,其隐私保护效果和与原文本语义一致性的平衡是一个难题.现有差分隐私脱敏方法对敏感词脱敏时,采用相似性计算概率法选取敏感词的替代词,易造成替代词与原文语义不一致甚至无关,严重影响脱敏文本对原文语义的保持.提出一种强化语义一致性的差分隐私文本脱敏方法,给定一种截断距离度量公式调整替换词选中概率限制语义无关替换词.真实数据集的实验结果表明,该方法有效提升了脱敏文本与原文的语义一致性,实际应用价值大。Text desensitization is an extremely important privacy protection method,and the balance between its privacy protection effect and semantic consistency with the original text is a challenge.When existing differential privacy desensitization methods are used to desensitize sensitive words,the similarity calculation probability method is used to select substitute words for sensitive words,which can easily cause inconsistency or even irrelevance between the substitute words and the original text semantics,seriously affecting the preservation of the original text semantics in the desensitized text.A differential privacy text desensitization method is proposed to enhance semantic consistency.A truncation distance measurement formula is given to adjust the probability of selecting replacement words and limit semantic irrelevant replacement words.The experimental results on real datasets show that it effectively improves the semantic consistency between desensitized text and the original text,and has great practical application value.

关 键 词:文本脱敏 差分隐私保护 语义一致性 词嵌入 推断攻击 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象