基于Transformer的司法文书命名实体识别方法  被引量:1

Named Entity Recognition Approach of Judicial Documents Based on Transformer

在线阅读下载全文

作  者:王颖洁[1] 张程烨 白凤波 汪祖民[1] WANG Yingjie;ZHANG Chengye;BAI Fengbo;WANG Zumin(College of Information Engineering,Dalian University,Dalian 116622,China;School of Artificial Intelligence,Guangxi Minzu University,Nanning 530006,China)

机构地区:[1]大连大学信息工程学院,大连116622 [2]广西民族大学人工智能学院,南宁530006

出  处:《计算机科学》2024年第S01期113-121,共9页Computer Science

摘  要:命名实体识别是自然语言处理领域的关键任务之一,是实现下游任务的基础。目前针对司法领域的相关研究相对较少,司法系统的信息化和智能化转型仍有许多问题亟需解决。相比其他领域的文本,司法文书存在专业性强、语料资源少等局限,导致现有的司法文书识别结果较低。因此,从以下3方面开展研究:首先,提出了一种多标签层级迭代的文本标注方式,可以对原始司法文书文本进行自动化标注,同时有效地提升司法文书命名实体识别任务的实体识别效果;其次,提出了一种交融式的Transformer神经网络模型,对汉字固有属性的深层特征进行了充分利用,用于对司法文书进行命名实体识别;最后,对所提出的标注方法和模型与其他神经网络模型进行了对比实验。所提出的文本标注方式可以较为准确地实现司法文书的标注任务;同时,所提出的模型在通用数据集中相对于对照模型有较大的提高,并在司法领域数据集中取得了良好的效果。Named entity recognition is one of the key tasks in the field of natural language processing,and it is the foundation of downstream tasks.At present,there are relatively few research results on the judicial field,and there are still many problems need to be solved in the informatization and intelligent transformation of the judicial system.Compared with texts in other fields,judicial documents have limitations such as strong professionalism and few corpus resources,leading to low recognition results of existing judicial documents.Therefore,the research is carried out from the following three aspects.Firstly,a multi-label hierarchical iterative annotation method(ML-HIA)is proposed,which can automatically annotate the original judicial documents and effectively improve the effect of the entity recognition task of judicial documents.Secondly,an feature mixed Transformer(FM-Transformer)neural network model,which makes full use of the deep features of the inherent attributes of Chinese characters,is proposed to identify named entities of judicial documents.Finally,the proposed method and model are compared with other neural network models.The proposed method of text annotation can realize the task of judicial document annotation accurately.At the same time,compared with other models,the proposed model has a great improvement in the general dataset,and has achieved good results in the judicial datasets.

关 键 词:自然语言处理 数据标注 Transformer模型 深度学习 司法信息化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象