检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:余诗媛 郭淑明[2] 黄瑞阳[2] 张建朋[2] 胡楠 YU Shi-yuan;GUO Shu-ming;HUANG Rui-yang;ZHANG Jian-peng;HU Nan(School of Software,Zhengzhou University,Zhengzhou 450001,China;National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China)
机构地区:[1]郑州大学软件学院,河南郑州450001 [2]国家数字交换系统工程技术研究中心,河南郑州450002
出 处:《计算机技术与发展》2022年第9期161-166,179,共7页Computer Technology and Development
基 金:国家自然基金青年基金项目(62002384);中国博士后科学基金面上项目(47698);郑州市协同创新重大专项(162/32410218)。
摘 要:嵌套命名实体之间蕴含着丰富的语义关系与结构信息,开发能够准确识别嵌套命名实体的算法具有重要研究意义。针对现有的中文嵌套命名实体数据集中存在错标漏标以及现有识别方法大多忽略嵌套实体内部信息关联关系而导致准确性下降的问题,结合自动生成与手动标注的方法构建新的中文嵌套命名实体数据集NEPD,在此基础上,设计一种利用分层区域穷举的中文嵌套命名实体识别模型。该模型通过遍历文本组合实体,获取低层编码层的词嵌入信息;其次,为使邻接编码层之间实现信息交换,将低层编码层的词嵌入信息融入高层编码层;最后,利用多层解码层使长度为L的命名实体仅在第L层预测,有效防止错误传播现象发生从而提高识别准确度。实验结果表明,在没有外部知识资源的情况下,LREM模型在嵌套命名实体与非嵌套命名实体上的识别F1值分别达到87.19%和86.27%,其中非嵌套命名实体识别的F1值比传统的BiLSTM+CRF模型提升1.18%,验证了该模型的可靠性。Nested named entities contain rich semantic relationships and structural information among them,and it is essential to develop algorithms that can accurately identify nested named entities.To address the problems of mislabeling and omission in the existing Chinese nested named entity dataset,and the problem that most of the existing recognition methods ignore the internal information association relationship of nested entities,a new Chinese nested named entity dataset NEPD is constructed by combining automatic generation and manual annotation methods,based on which a Chinese nested named entity recognition model is designed using hierarchical region exhaustive.The model obtains the word embedding information of the lower coding layer by traversing the text combination entities.Furthermore,the word embedding information of the lower coding layer is incorporated into the higher coding layer to exchange data between neighboring coding layers.Finally,the named entities of length L are predicted only in the L layer by using multiple decoding layers,which effectively prevents the occurrence of error propagation and thus improves the recognition accuracy.The experimental results show that without external knowledge resources,the F1 values of the LREM model reach 87.19%and 86.27%for the recognition of nested named entities and non-nested named entities,respectively,with the F1 value of non-nested named entities recognition improving 1.18%compared with the traditional BiLSTM+CRF model.The experiments verify the reliability of the model in this paper.
关 键 词:嵌套命名实体识别 分层区域穷举 卷积神经网络 双向长短时记忆网络 信息抽取
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.58.11.68