检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵玉媛 王斌[2] 张泽丹 李青山 胡建斌 Zhao Yuyuan;Wang Bin;Zhang Zedan;Li Qingshan;Hu Jianbin(School of Software and Microelectronics,Peking University,Beijing 102627;Chinese Medicine Data Center,China Academy of Chinese Medical Sciences,Beijing 100700;Boya RegChain Beijing Inc.,Beijing 100037;School of Com puter Science,Peking University,Beijing 100871)
机构地区:[1]北京大学软件与微电子学院,北京102627 [2]中国中医科学院中医药数据中心,北京100700 [3]博雅正链(北京)科技有限公司,北京100037 [4]北京大学计算机学院,北京100871
出 处:《信息安全研究》2024年第2期139-147,共9页Journal of Information Security Research
基 金:国家自然科学基金面上项目(82274685)。
摘 要:随着数据脱敏技术的持续进步,精确识别隐私数据已成为关键挑战.目前,隐私信息抽取算法主要基于传统自然语言处理技术,如双向循环神经网络和基于注意力机制的预训练语言模型(如BERT).这些模型利用其强大的上下文特征表示能力,克服了传统方法在多义词表示方面的限制.然而,它们在精确判断实体边界方面仍有改进空间.提出了一种新颖的隐私信息抽取算法,该算法融合结构先验知识,通过一种隐私数据结构知识增强机制,提高模型对句子语义结构的理解,从而提高了隐私信息边界判断的准确性.此外,还在多个公开数据集上对模型进行评估,详细的实验结果展示了其有效性.With the continuous advancement of data anonymization technology,accurately identifying private data has become a key challenge.Currently,privacy information extraction algorithms are primarily based on traditional natural language processing techniques,such as bidirectional recurrent neural networks and attention mechanism-based pretrained language models(like BERT and its variants).These models leverage their powerful ability to represent contextual features,overcoming the limitations of traditional methods in representing polysemous words.However,there is still room for improvement in their ability to accurately determine entity boundaries.This study proposes a novel privacy information extraction algorithm that integrates structural prior knowledge and a unique privacy data structural knowledge enhancement mechanism,enhancing the model's understanding of sentence semantic structures,thereby improving the accuracy of privacy information boundary determination.Moreover,we have evaluated the model on multiple public datasets and provided a detailed analysis of the experimental results,demonstrating its effectiveness.
关 键 词:结构先验知识 结构增强机制 隐私信息抽取算法 实体边界判断 数据脱敏 自然语言处理
分 类 号:TP309.2[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43