面向合同信息抽取的动态多任务学习方法  被引量:1

Dynamic Multitask Learning Approach for Contract Information Extraction

在线阅读下载全文

作  者:王浩畅[1] 郑冠彧 赵铁军[2] WANG Hao-Chang;ZHENG Guan-Yu;ZHAO Tie-Jun(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)

机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318 [2]哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨150001

出  处:《软件学报》2024年第7期3377-3391,共15页Journal of Software

基  金:国家自然科学基金(61402099,61702093)。

摘  要:对于合同文本中要素和条款两类信息的准确提取,可以有效提升合同的审查效率,为贸易各方提供便利化服务.然而当前的合同信息抽取方法一般训练单任务模型对要素和条款分别进行抽取,并没有深挖合同文本的特征,忽略了不同任务间的关联性.因此,采用深度神经网络结构对要素抽取和条款抽取两个任务间的相关性进行研究,并提出多任务学习方法.所提方法首先将上述两种任务进行融合,构建一种应用于合同信息抽取的基本多任务学习模型;然后对其进行优化,利用Attention机制进一步挖掘其相关性,形成基于Attention机制的动态多任务学习模型;最后针对篇章级合同文本中复杂的语义环境,在前两者的基础上提出一种融合词汇知识的动态多任务学习模型.实验结果表明,所提方法可以充分捕捉任务间的共享特征,不仅取得了比单任务模型更好的信息抽取结果,而且能够有效解决合同文本中要素与条款间实体嵌套的问题,实现合同要素与条款的信息联合抽取.此外,为了验证该方法的鲁棒性,在多个领域的公开数据集上进行实验,结果表明该方法的效果均优于基线方法.Accurately extracting two types of information including elements and clauses in contract texts can effectively improve the contract review efficiency and provide facilitation services for all trading parties.However,current contract information extraction methods generally train single-task models to extract elements and clauses separately,whereas they do not dig deep into the characteristics of contract texts,ignoring the relevance among different tasks.Therefore,this study employs a deep neural network structure to study the correlation between the two tasks of element extraction and clause extraction and proposes a multitask learning method.Firstly,the primary multitask learning model is built for contract information extraction by combining the above two tasks.Then,the model is optimized and attention mechanism is adopted to further explore the correlation.Additionally,an Attention-based dynamic multitask-learning model is built.Finally,based on the above two methods,adynamic multitask learning model with lexical knowledge is proposed for the complex semantic environment in contract texts.The experimental results show that the method can fully capture the shared features among tasks and yield better information extraction results than the single-task model.It can solve the nested entity among elements and clauses in contract texts,and realize the joint information extraction of contract elements and clauses.In addition,to verify the robustness of the proposed method,this study conducts experiments on public datasets in various fields,and the results show that the proposed method is superior to baseline methods.

关 键 词:多任务学习 合同文本 信息联合抽取 注意力机制 实体嵌套 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象