基于Fin-BERT的中文金融领域事件抽取方法  

Fin-BERT-Based Event Extraction Method for Chinese Financial Domain

在线阅读下载全文

作  者:李熠 耿朝阳[1] 杨丹 LI Yi;GENG Chaoyang;YANG Dan(School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China)

机构地区:[1]西安工业大学计算机科学与工程学院,西安710021

出  处:《计算机工程与应用》2024年第14期123-132,共10页Computer Engineering and Applications

摘  要:事件抽取旨在从海量的非结构化的事件相关文本中抽取出人类感兴趣的内容,目前现有的事件抽取方法大多数基于通用语料,很少考虑到领域内的先验知识,并且现有的方法大多数不能很好地处理同一文档包含多个事件的情况,面对存在较多负面样例的测试也表现不佳。针对上述问题提出了一种基于Fin-BERT(financial bidirectional encoder representation from Transformers)和PTPCG(pseudo-trigger-aware pruned complete graph)的模型FinPTPCG,该方法充分利用Fin-BERT预训练模型的表达能力,在编码阶段融入领域内的先验知识,并且在事件检测模块采用多个二元分类器叠加的方式,保证模型可以有效识别一篇文档内存在多事件的情况并筛除掉负面样例,抽取实体之后将实体连接成完全图并通过计算相似度矩阵进行剪枝,通过选择伪触发器解决无标注触发词的问题,最后接入事件分类器实现事件抽取。该方法在ChFinAnn和Duee-fin数据集上事件抽取任务的F1值相比于基线方法分别取得了0.7个百分点和3.7个百分点的提升。Event extraction aims to extract human-interest information from massive amounts of unstructured text.Currently,most existing event extraction methods are based on general corpora and rarely consider domain-specific prior knowledge.Moreover,most methods cannot handle well the case where multiple events exist in the same document,and they perform poorly when faced with a large number of negative examples.To address these issues,this paper proposes a model called Fin-PTPCG based on Fin-BERT(financial bidirectional encoder representation from Transformers)and PTPCG(pseudo-trigger-aware pruned complete graph).This method fully utilizes the expression ability of the Fin-BERT pre-training model and incorporates domain-specific prior knowledge during the encoding stage.In the event detection module,multiple binary classifiers are stacked to ensure that the model can effectively identify the situation of multiple events in a document and screen out negative examples.Combined with the decoding module of the PTPCG model,entities are extracted and connected into a complete graph and pruned by calculating a similarity matrix.The problem of unlabeled triggers is solved by selecting pseudo-triggers.Finally,the event extraction is achieved by the event classifier.This method achieves a 0.7 and 3.7 percentage points improvement in F1 score compared to the baselines on the ChFinAnn and Duee-fin datasets for the event extraction task.

关 键 词:事件抽取 事件检测 信息抽取 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象