基于Transformer与HowNet义原知识融合的双驱动语义蕴含识别  被引量:1

Co-driven Recognition of Semantic Entailment Based on Fusion of Transformer and HowNet Sememe Knowledge

在线阅读下载全文

作  者:陈帆 黄炎[2] 张新访[1] CHEN Fan;HUANG Yan;ZHANG Xin-Fang(School of Mechanical Science&Engineering,Huazhong University of Science and Technology,Wuhan 430074,China;School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区:[1]华中科技大学机械科学与工程学院,武汉430074 [2]华中科技大学人工智能与自动化学院,武汉430074

出  处:《计算机系统应用》2023年第5期291-299,共9页Computer Systems & Applications

基  金:国家重点研发计划(2021YFB2012202);湖北省科技重大专项(2020AEA011);湖北省重点研发计划(2020BAB100,2021BAA171,2021BAA038)。

摘  要:语义蕴含识别旨在检测和判断两个语句的语义是否一致,以及是否存在蕴含关系.然而现有方法通常面临中文同义词、一词多义现象困扰和长文本难理解的挑战.针对上述问题,本文提出了一种基于Transformer和HowNet义原知识融合的双驱动中文语义蕴含识别方法,首先通过Transformer对中文语句内部结构语义信息进行多层次编码和数据驱动,并引入外部知识库HowNet进行知识驱动建模词汇之间的义原知识关联,然后利用softattention进行交互注意力计算并与义原矩阵实现知识融合,最后用BiLSTM进一步编码文本概念层语义信息并推理判别语义一致性和蕴含关系.本文所提出的方法通过引入HowNet义原知识手段解决多义词及同义词困扰,通过Transformer策略解决长文本挑战问题.在BQ、AFQMC、PAWSX等金融和多语义释义对数据集上的实验结果表明,与DSSM、MwAN、DRCN等轻量化模型以及ERNIE等预训练模型相比,该模型不仅可以有效提升中文语义蕴含识别的准确率(相比DSSM模型提升2.19%),控制模型的参数量(16 M),还能适应50字及以上的长文本蕴含识别场景.Semantic entailment recognition aims to detect and judge whether the semantics of two Chinese sentences are consistent and whether there is an entailment relationship.The existing methods,however,usually face the challenges of Chinese synonyms,polysemy,and difficulty in understanding long texts.To solve the above problems,this study proposes a co-driven Chinese semantic entailment recognition method based on the fusion of Transformer and sememe knowledge of HowNet.First,the internal structural semantic information of Chinese sentences is encoded at multiple levels and undergoes data-driven processing by Transformer.The external knowledge base HowNet is introduced for knowledge-driven modeling of the sememe knowledge correlations between words.Then,the interaction attention is calculated by Soft-Attention and achieves knowledge fusion with the sememe matrix.Finally,BiLSTM is used to encode the semantic information of the conceptual layer of texts and infer and judge the semantic consistency and entailment relationship.The proposed method employs the sememe knowledge of HowNet to solve the problems of polysemy and synonyms and uses the Transformer strategy to resolve the challenge of long texts.The experimental results on financial and multi-semantic interpretation pair data sets such as BQ,AFQMC,and PAWSX show that compared with lightweight models such as DSSM,MwAN,and DRCN and pre-trained models such as ERNIE,this model can effectively improve the recognition accuracy of Chinese semantic entailment(an increase of 2.19%compared with that of the DSSM model)and control the number of model parameters(16 M).In addition,it can also adapt to entailment recognition scenarios of long texts with no less than 50 words.

关 键 词:义原知识融合 TRANSFORMER HOWNET 蕴含识别 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象