文本表示及其特征生成对法律判决书中多类型实体识别的影响分析  被引量:5

Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation

在线阅读下载全文

作  者:王昊[1,2] 林克柔 孟镇 李心蕾 Wang Hao;Lin Kerou;Meng Zhen;Li Xinlei(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)

机构地区:[1]南京大学信息管理学院,南京210023 [2]江苏省数据工程与知识服务重点实验室,南京210023

出  处:《数据分析与知识发现》2021年第7期10-25,共16页Data Analysis and Knowledge Discovery

基  金:国家自然科学基金面上项目(项目编号:72074108);南京大学文科青年跨学科团队专项(项目编号:2020300093);江苏青年社科英才和南京大学仲英青年学者等人才培养计划的研究成果之一。

摘  要:【目的】探索法律判决书中不同模型的实体识别效果,为法律知识库的构建奠定基础。【方法】提取刑事判决书中的庭审过程和法院意见构造数据集,比较人工构造特征的CRFs模型和加入预训练词向量做文本表示的自动生成特征的IDCNN-CRFs模型与BiLSTM-CRFs模型的实体识别效果,并在少量其他类型法律判决书文本上比较模型的迁移能力。【结果】ALBERT-BiLSTM-CRFs模型实体识别效果最好,F1微平均值达95.28%;IDCNN-CRFs模型的识别效果低于前者,但训练时间是前者的1/6,两个模型均具有较好的迁移能力。【局限】识别的实体多为通用实体,后续考虑标注更多领域特有实体,增强研究对实际应用的参考价值。【结论】法律判决书的实体识别中,ALBERT-BiLSTM-CRFs和IDCNN-CRFs模型比CRFs模型效果更好,且迁移能力更强。[Objective]This paper investigates the performance of entity recognition models for legal judgments,aiming to construct better legal knowledge base in the future.[Methods]First,we extracted the court trial process and court opinions from criminal judgment texts to build an experimental dataset.Then,we compared the entity recognition results of the CRFs model(with artificially constructed features),the IDCNN-CRFs model(with automatically generated features),and the BiLSTM-CRFs model.Both of the IDCNN-CRFs and BiLSTM-CRFs models used pre-trained word vectors for their char embedding.The models’transferred abilities on other types of legal judgment texts were also compared.[Results]The ALBERT-BiLSTM-CRFs model had the best recognition performance.Its F1 micro-average value reached 95.28%.However,the training time of the IDCNN-CRFs model was about 1/6 of the ALBERT-BiLSTM-CRFs model.Both models had good transferred abilities.[Limitations]Most of the recognized entities were the general ones.More domain-related entities are needed in future studies to enhance the model’s practical value.[Conclusions]The ALBERT-BiLSTM-CRFs and IDCNN-CRFs models could more effectively recognize entities from legal judgments and show better transferred ability than the CRFs model.

关 键 词:法律判决书 特征生成 条件随机场 IDCNN-CRFs ALBERT-BiLSTM-CRFs 

分 类 号:D916.1[政治法律—法学] TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象