基于双路分段注意力神经张量网络的临床文本关系抽取  被引量:2

Clinical Relation Extraction via Dual Piecewise Attention Neural Tensor Network

在线阅读下载全文

作  者:隗昊 唐焕玲[3] 周爱 张益嘉 陈飞[2] 鲁明羽[2] WEI Hao;TANG Huan-ling;ZHOU Ai;ZHANG Yi-jia;CHEN Fei;LU Ming-yu(School of Software,Dalian University of Foreign Languages,Dalian,Liaoning 116044,China;Information Science and Technology College,Dalian Maritime University,Dalian,Liaoning 116026,China;School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264005,China)

机构地区:[1]大连外国语大学软件学院,辽宁大连116044 [2]大连海事大学信息科学技术学院,辽宁大连116026 [3]山东工商学院计算机科学与技术学院,山东烟台264005

出  处:《电子学报》2023年第3期658-665,共8页Acta Electronica Sinica

基  金:国家自然科学基金(No.61976124,No.62072070)。

摘  要:目前,生物医学领域的关系提取工作已经取得了长足的发展,但是在面对句式复杂的临床医学文本时,由于存在大量长句以及句中实体对的高密度分布,限制了当前关系抽取模型性能的进一步提升.本文提出了一种基于张量权重矩阵的双向门控循环单元网络(Tensor-based Bidirectional Gated Recurrent Unit,Tensor-BiGRU)和分段注意力机制的关系抽取模型,基于张量权重矩阵改进BiGRU网络的编码方式,提升神经网络捕获底层特征的能力,而后提出了两种分段注意力机制,以提高模型捕获长句特征的性能.此外,当句子中有多个实体对时,引入实体对的语义信息特征来克服模型的性能下降.本文进一步提出一种权重自适应的交叉熵损失函数,用于提升模型面对数据集中不同关系类别的样本分布不平衡问题的泛化性.实验结果表明,在不依赖任何特征工程和高性能运算环境的情况下,本文模型在2010 i2b2/VA临床关系抽取数据集上实现了先进的性能.At present,biomedical relation extraction has made considerable progress.However,when dealt with com⁃plex clinical texts,due to the large number of long sentences and the high density distribution of entity pairs in the sentenc⁃es,the existing methods of relation extraction still have defects.We propose a relation extraction model via tensor-based bi⁃directional gate recurrent unit(Tensor-BiGRU)and piecewise attention mechanism.The ability of BiGRU to extract the un⁃derlying features is enhanced based on tensor weight matrix.Two kinds of piecewise attention mechanisms are proposed to improve the performance of the model in capturing long sentence features.When the sentence has multiple entity pairs,the semantic representations of the entity pairs are introduced to overcome the performance degradation of the mode.A weightadaptive cross-entropy loss function is proposed to improve the generalization of the model when the sample distribution of different relation categories in the dataset is unbalanced.The experimental results show that without relying on any feature engineering and high-performance computing environment,the model achieves advanced performance on the 2010 i2b2/VA clinical data set.

关 键 词:关系抽取 临床文本 神经张量网络 分段注意力机制 样本不平衡 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象