基于匹配策略和社区注意力机制的法律文书命名实体识别被引量：14

Name Entity Recognition in Legal Instruments Based on Matching Strategy and Community Attention Mechanism

作　　者：郭力华李旸王素格[1,3] 陈鑫[1] 符玉杰裴文生 GUO Lihua;LI Yang;WANG Suge;CHEN Xin;FU Yujie;PEI Wensheng(School of Computer and Information Technology,Shanxi University,Taiyuan,Shanxi 030006,China;School of Finance,Shanxi University of Finance and Economics,Taiyuan,Shanxi 030006,China;Key Laboratory of Ministry of Education for Computation Intelligence and Chinese Information Processing,Shanxi University,Taiyuan,Shanxi 030006,China;Beijing LvDianTong Technology Co.Ltd,Taiyuan,Shanxi 030006,China)

机构地区：[1]山西大学计算机与信息技术学院,山西太原030006 [2]山西财经大学金融学院,山西太原030006 [3]山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006 [4]北京市律典通科技有限公司,山西太原030006

出　　处：《中文信息学报》2022年第2期85-92,共8页Journal of Chinese Information Processing

基　　金：国家自然科学基金(62076158,62106130,62072294);山西省研究生创新项目(2021Y160);山西省重点研发计划项目(201803D421024);山西省基础研究计划项目(20210302124084)。

摘　　要：根据司法案件文书中实体名长度较长以及实体间的关联性较强这一特点,该文提出了一种利用最大正向匹配策略和社区注意力机制(FMM-CAM)的法律文书命名实体识别方法。该方法利用最大正向匹配策略,优先获得法律文书中每个字对应的较长的匹配词,将匹配词按字在词中的位置划分到B、M、E、S四个匹配词社区,并利用社区自注意力机制获取不同匹配词之间的关联性权重信息。具体过程利用BERT和Word2Vec的字表示,将字和匹配词社区压缩后的匹配词进行向量拼接,输入到一个BiLSTM中获得句子的语义表示,再利用CRF将句子进行解码,得到最优标签序列。实验结果表明,该文提出的方法可以对法律文书中的证据名、证实内容和卷宗号等实体边界进行有效确定。We observe that the length of entity names in judicial case documents are longer, with strong mutual correlation. This paper proposes a name entity recognition method based on the forward maximum matching strategy and community attention mechanism(FMM-CAM). In particular, the forward maximum matching strategy captures longer matching words corresponding to each character in the legal instrument by their positions in sentences, and then assigned as one of the four tags ina community: B, M, E and S. A community self-attention mechanism is exploited to get the better word embedding by assigning different weights to the different communities. Concatenating the word embedding and char embedding by BERT and Word2 Vec models as input, a bidirectional LSTM is applied to obtain the semantic representations of the sentences, which are finally optimized for the tag sequence by CRF model. The experimental results show that the proposed method can effectively determine the entity boundary of legal documents, such as the evidence name, the proof contents and the files number.

关键词：法律文书命名实体识别自注意力 BiLSTM

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于匹配策略和社区注意力机制的法律文书命名实体识别被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于匹配策略和社区注意力机制的法律文书命名实体识别 被引量：14

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于匹配策略和社区注意力机制的法律文书命名实体识别被引量：14