检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张天宇[1] 孙媛媛[1] 杜文玉 邢铁军[3] 林鸿飞[1] 杨亮[1] ZHANG Tianyu;SUN Yuanyuan;DU Wenyu;XING Tiejun;LIN Hongfei;YANG Liang(School of Computer Science,Dalian University of Technology,Dalian 116024,China;Procuratorial Technology and Information Research Center,Supreme People's Procuratorate,Beijing 100726,China;Neusoft Corporation,Dalian 116024,China)
机构地区:[1]大连理工大学计算机学院,大连116024 [2]最高人民检察院检察技术信息研究中心,北京100726 [3]东软集团股份有限公司,大连116024
出 处:《清华大学学报(自然科学版)》2024年第5期749-759,共11页Journal of Tsinghua University(Science and Technology)
基 金:国家重点研发计划项目(2022YFC3301801);中央高校基本科研业务费资助项目(DUT22ZD205)。
摘 要:法律文书命名实体识别是智慧司法的关键任务。现有的序列标注模型仅关注字符信息,导致在法律文书命名实体识别任务中无法获得语义和词语的上下文信息,且无法对实体的边界进行限制。因此,该文提出了一个融合外部信息并对边界限制的司法命名实体识别模型(semantic and boundary enhance named entity recognition,SBENER)。该模型收集了40万条盗窃罪法律文书,首先,预训练模型,将获得的司法盗窃罪词向量作为输入模型的外部信息;其次,设计Adapter,将司法盗窃罪的信息融入字符序列以增强语义特征;最后,使用边界指针网络对实体边界进行限制,解决了序列标注模型丢失词语信息及缺少边界限制的问题。该模型在CAILIE 1.0数据集和LegalCorpus数据集上进行实验,结果表明,SBENER模型在2个数据集上的F_1值(F_1-score)分别达88.70%和87.67%,比其他基线模型取得了更好的效果。SBENER模型能够提升司法领域命名实体识别的效果。[Objective]Named entity recognition(NER),a central task in the information extraction realm,aims to precisely identify various named entity types in textual content,including personal names,locations,and organizational names.In Chinese NER domain,deep learning techniques are crucial for character and vocabulary representations and feature extractions,yielding remarkable research achievements.Common deep learning models for NER include sequence labeling,span-based approaches,generative methods,and table-based strategies.Nevertheless,this task suffers from the scarcity of lexical information.Hence,this challenge is perceived as a primary hindrance limiting the development of high-performance Chinese NER systems.Despite developing extensive lexical dictionaries encompassing rich vocabulary boundaries and semantic insights,effective incorporation of this lexical knowledge into Chinese NER task remains a considerable challenge.Particularly,the seamless integration of semantic information from matching vocabulary and its contextual cues into Chinese character sequence remains intricate.Moreover,ensuring the accurate delimitation of named entity boundaries is still a remarkable concern.In the realm of intelligent judicial systems,the NER task within legal documents has garnered significant attention.Nonetheless,prevailing sequence labeling models predominantly rely on character information,constraining their capacity to capture semantic and lexical contextual nuances and inadequately addressing entity boundary constraints.To resolve these challenges,this paper introduces an innovative model called semantic and boundary enhanced named entity recognition(SBENER).To enhance the semantic features of legal documents within the SBENER model,external information containing vocabulary pertinent to theft crimes is smartly integrated.Initially,word vectors for theft crime terms are acquired through pretraining.Subsequently,a vocabulary dictionary tree is constructed,enabling the potential vocabulary candidate identification for e
分 类 号:TP393.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.169