检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何剑虎[1] 伊胜月[1] 宋丽莹 HE Jianhu;YI Shengyue;SONG Liying(Women's Hospital School of Medicine Zhejiang University,Hangzhou 310006,Zhejiang Province,China)
机构地区:[1]浙江大学医学院附属妇产科医院,杭州310006
出 处:《中国数字医学》2025年第4期61-67,共7页China Digital Medicine
基 金:浙江省医药卫生科技项目——电子医疗文档个人敏感信息自动发现与脱敏系统研究(2022PY062)。
摘 要:目的:实现电子医疗文档在共享时的脱敏处理,保护患者隐私。方法:构建一个集成多种机器学习模型的医疗数据词法分析器,整理医疗健康领域的中文分词、词性标注和命名实体识别语料库,利用隐马尔可夫、条件随机场等自然语言处理技术和内置敏感信息特征库识别电子医疗文档中的敏感信息,并通过结果集流式处理技术实现动态脱敏。结果:算法模型在处理常规个人敏感信息时效果较好,个人敏感信息的发现与脱敏平均耗时在毫秒级别。结论:自然语言处理结合敏感信息特征库的方法可实现非结构化电子医疗文档敏感信息的识别与实时脱敏。Objective To achieve desensitization of electronic medical documents during sharing,and to protect patient privacy.Methods A lexical analyzer for medical data integrating multiple machine learning models was constructed to sort out Chinese word segmentation,part-of-speech tagging,and named entity recognition corpora in the field of medical and healthcare.Sensitive information in electronic medical documents was identified by using natural language processing technologies such as Hidden Markov Models and Conditional Random Fields and built-in sensitive information signature library,and dynamic desensitization was realized through result set streaming processing technology.Results The algorithm model has a good effect on the processing of routine sensitive personal information,with an average time of detection and desensitization of sensitive personal information was milliseconds.Conclusion The method of natural language processing with sensitive information signature library can realize the recognition and real-time desensitization of sensitive information in unstructured electronic medical documents.
关 键 词:电子医疗文档 敏感信息识别 动态脱敏 自然语言处理 流式处理
分 类 号:R197.3[医药卫生—卫生事业管理] R319[医药卫生—公共卫生与预防医学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7