检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:许思遥 曾健骏 张维彦 叶琪 朱焱 XU Siyao;ZENG Jianjun;ZHANG Weiyan;YE Qi;ZHU Yan(School of Mathematics,East China University of Science and Technology,Shanghai 200237,China;School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
机构地区:[1]华东理工大学数学学院,上海200237 [2]华东理工大学信息科学与工程学院,上海200237
出 处:《计算机科学》2025年第2期253-260,共8页Computer Science
基 金:上海市促进产业高质量发展专项资金(2021-GZL-RGZN-01018)。
摘 要:依存句法分析是一项重要的自然语言处理任务,其目标是识别句子中词与词之间的依存关系。但在面向中文医疗电子病历的依存句法分析中,现有的研究存在以下问题:当出现缺省指示语法结构的成分和修饰成分位置多样的情况时,当前的通用解析器无法准确分析。针对该问题,提出基于大小语言模型协同增强的中文电子病历依存句法分析方法。首先,分析中文电子病历的语言特征,提出通过成分补全指示医疗文本中的特殊语法结构。然后,利用通用解析器进行依存句法分析,对于解析后的语法图,利用大语言模型的先验语法知识进行自动修正。此外,所提方法将重点放在缩小医疗文本与通用文本之间的特征分布差异上,故不受医疗领域缺少标注数据的限制。针对中文电子病历的依存句法分析,标注了444条测试样本,并对所提方法进行验证。实验表明该方法能有效地对中文电子病历进行依存分析,基于少量标注语料,LAS指标可达92.42,UAS指标可达94.60,并且在不同科室的中文电子病历上也能够达到同样显著的效果。Dependency parsing is a crucial task in natural language processing,aiming to identify the syntactic dependencies between words in a sentence.However,existing research on dependency parsing for Chinese electronic medical records faces follo-wing problems:current general-purpose parsers are unable to accurately analyze the situation when there is a lack of components indicative of grammatical structure and a variety of positions of modifiers.To address these issues,this paper proposes a method based on a dual-scale collaborative enhancement of large and small language models for dependency parsing of Chinese electronic medical records.Specifically,we first analyze the linguistic features of Chinese electronic medical records,and propose component completion to indicate special grammatical structures in medical texts.Subsequently,we utilize a generic parser for dependency parsing,for the parsed syntactic graph,we employ the prior grammatical knowledge of a large language model to modify it automatically.In addition,since our approach focuses on narrowing the feature distribution gap between medical and generic texts,it is not constrained by the lack of annotated data in the medical domain.This study annotates 444 samples for dependency parsing of Chinese electronic medical records,which validates our method.Experimental results demonstrate the effectiveness of our approach in parsing Chinese electronic medical records,achieving LAS and UAS metrics of 92.42 and 94.60 in the scenario with little data.The proposed method also shows significant performance in various departments.
关 键 词:自然语言处理 依存句法分析 中文电子病历 大语言模型 协同增强
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117