融合卷积门控与实体边界预测的中文财务报表实体抽取研究  

A Study on Entity Extraction for Chinese Financial Statements Incorporating Convolutional Gating and Entity Boundary Prediction

在线阅读下载全文

作  者:王婷 杨川 梁佳莹 向东 邹茂扬[1] WANG Ting;YANG Chuan;LIANG Jiaying;XIANG Dong;ZOU Maoyang(School of Computer Science,Chengdu University of Information Technology,Chengdu 610225,China)

机构地区:[1]成都信息工程大学计算机学院,四川成都610225

出  处:《软件导刊》2024年第7期25-33,共9页Software Guide

基  金:四川省科技厅重点研发项目(2021YFG0031,2022YFG0375);四川省科技服务业示范项目(2021GFW130)。

摘  要:在金融领域财务报表对企业的发展规划具有重要作用,但提取报表中的有效信息仍然高度依赖于人工。为此,提出一种融合关键信息和实体边界信息的财务报表命名实体识别方法,以提升财务报表有效信息提取效率。首先,通过两个卷积层、自注意力机制及门控机制组成的卷积门控单元对编码器的输出进行局部特征提取,筛选关键信息来引导实体识别;其次,通过实体边界预测模块将实体边界信息融入具有句子依赖关系的长序列语义特征;最后,将关键信息和融合了实体边界信息的长序列语义特征输入条件随机场层,以提取满足实体标注规则的相邻标签间的依赖,并获得全局最优标签序列。实验表明,所提模型在Resume、MSRA数据集上的F1值分别为95.75%、94.92%,优于所有比较模型,证明了该方法在中文命名实体识别的有效性;在财务报表数据集上的准确率、召回率、F1值分别为87.93%、92.45%、90.13%,相较于基线模型效果更好,能有效识别金融领域命名实体。Financial statements play an important role in the development planning of enterprises in the financial field,but extracting effec-tive information from the statements still heavily relies on manual labor.To this end,a named entity recognition method for financial state-ments is proposed that integrates key information and entity boundary information to improve the efficiency of extracting effective information from financial statements.Firstly,a convolutional gating unit consisting of two convolutional layers,self attention mechanism,and gating mechanism is used to extract local features from the encoder′s output,screen key information,and guide entity recognition;Then,the entity boundary prediction module is used to integrate the entity boundary information into the long sequence semantic features with sentence depen-dency relationships;Finally,the key information and the long sequence semantic features fused with entity boundary information are input in-to the conditional random field layer to extract the dependencies between adjacent labels that meet the entity labeling rules and obtain the glob-al optimal label sequence.The experiment shows that the F1 values of the proposed model on the Resume and MSRA datasets are 95.75%and 94.92%,respectively,which are better than all comparison models,proving the effectiveness of this method in Chinese named entity recogni-tion;The accuracy,recall,and F1 values on the financial report publication dataset are 87.93%,92.45%,and 90.13%,respectively.Com-pared with the baseline model,the model performs better and can effectively identify named entities in the financial field.

关 键 词:金融 命名实体识别 卷积门控单元 实体边界预测 条件随机场 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象