检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:关业礼 罗森林[1] 潘丽敏[1] 张笈[1] 于经纬 Guan Yeli;Luo Senlin;Pan Limin;Zhang Ji;Yu Jingwei(School of Information and Electronics,Beijing Institute of Technology,Beijing 100081;North Institute for Scientific and Technical Information,Beijing 100089)
机构地区:[1]北京理工大学信息与电子学院,北京100081 [2]北方科技信息研究所,北京100089
出 处:《信息安全研究》2024年第8期706-711,共6页Journal of Information Security Research
基 金:国家重点研发计划项目(2018YFC2000300)。
摘 要:文本脱敏是一种极为重要的隐私保护方法,其隐私保护效果和与原文本语义一致性的平衡是一个难题.现有差分隐私脱敏方法对敏感词脱敏时,采用相似性计算概率法选取敏感词的替代词,易造成替代词与原文语义不一致甚至无关,严重影响脱敏文本对原文语义的保持.提出一种强化语义一致性的差分隐私文本脱敏方法,给定一种截断距离度量公式调整替换词选中概率限制语义无关替换词.真实数据集的实验结果表明,该方法有效提升了脱敏文本与原文的语义一致性,实际应用价值大。Text desensitization is an extremely important privacy protection method,and the balance between its privacy protection effect and semantic consistency with the original text is a challenge.When existing differential privacy desensitization methods are used to desensitize sensitive words,the similarity calculation probability method is used to select substitute words for sensitive words,which can easily cause inconsistency or even irrelevance between the substitute words and the original text semantics,seriously affecting the preservation of the original text semantics in the desensitized text.A differential privacy text desensitization method is proposed to enhance semantic consistency.A truncation distance measurement formula is given to adjust the probability of selecting replacement words and limit semantic irrelevant replacement words.The experimental results on real datasets show that it effectively improves the semantic consistency between desensitized text and the original text,and has great practical application value.
关 键 词:文本脱敏 差分隐私保护 语义一致性 词嵌入 推断攻击
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.36.122