基于大小语言模型协同增强的中文电子病历依存句法分析  

Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models

在线阅读下载全文

作  者:许思遥 曾健骏 张维彦 叶琪 朱焱 XU Siyao;ZENG Jianjun;ZHANG Weiyan;YE Qi;ZHU Yan(School of Mathematics,East China University of Science and Technology,Shanghai 200237,China;School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)

机构地区:[1]华东理工大学数学学院,上海200237 [2]华东理工大学信息科学与工程学院,上海200237

出  处:《计算机科学》2025年第2期253-260,共8页Computer Science

基  金:上海市促进产业高质量发展专项资金(2021-GZL-RGZN-01018)。

摘  要:依存句法分析是一项重要的自然语言处理任务,其目标是识别句子中词与词之间的依存关系。但在面向中文医疗电子病历的依存句法分析中,现有的研究存在以下问题:当出现缺省指示语法结构的成分和修饰成分位置多样的情况时,当前的通用解析器无法准确分析。针对该问题,提出基于大小语言模型协同增强的中文电子病历依存句法分析方法。首先,分析中文电子病历的语言特征,提出通过成分补全指示医疗文本中的特殊语法结构。然后,利用通用解析器进行依存句法分析,对于解析后的语法图,利用大语言模型的先验语法知识进行自动修正。此外,所提方法将重点放在缩小医疗文本与通用文本之间的特征分布差异上,故不受医疗领域缺少标注数据的限制。针对中文电子病历的依存句法分析,标注了444条测试样本,并对所提方法进行验证。实验表明该方法能有效地对中文电子病历进行依存分析,基于少量标注语料,LAS指标可达92.42,UAS指标可达94.60,并且在不同科室的中文电子病历上也能够达到同样显著的效果。Dependency parsing is a crucial task in natural language processing,aiming to identify the syntactic dependencies between words in a sentence.However,existing research on dependency parsing for Chinese electronic medical records faces follo-wing problems:current general-purpose parsers are unable to accurately analyze the situation when there is a lack of components indicative of grammatical structure and a variety of positions of modifiers.To address these issues,this paper proposes a method based on a dual-scale collaborative enhancement of large and small language models for dependency parsing of Chinese electronic medical records.Specifically,we first analyze the linguistic features of Chinese electronic medical records,and propose component completion to indicate special grammatical structures in medical texts.Subsequently,we utilize a generic parser for dependency parsing,for the parsed syntactic graph,we employ the prior grammatical knowledge of a large language model to modify it automatically.In addition,since our approach focuses on narrowing the feature distribution gap between medical and generic texts,it is not constrained by the lack of annotated data in the medical domain.This study annotates 444 samples for dependency parsing of Chinese electronic medical records,which validates our method.Experimental results demonstrate the effectiveness of our approach in parsing Chinese electronic medical records,achieving LAS and UAS metrics of 92.42 and 94.60 in the scenario with little data.The proposed method also shows significant performance in various departments.

关 键 词:自然语言处理 依存句法分析 中文电子病历 大语言模型 协同增强 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象