机构地区:[1]西北农林科技大学信息工程学院,陕西杨凌712100 [2]国家农业信息化工程技术研究中心,北京100097 [3]北京市农林科学院信息技术研究中心,北京100097 [4]农业农村部数字乡村技术重点实验室,北京100097
出 处:《智慧农业(中英文)》2025年第1期44-56,共13页Smart Agriculture
基 金:陕西省秦创原“科学家+工程师”队伍建设项目(2022KXJ-67);国家自然科学基金项目(62206222)。
摘 要:[目的/意义]中文猕猴桃文本在段落上下文主题与字符间的左右关系中,展现出垂直与水平双维度特性。若能充分利用中文猕猴桃文本的双维特性,将有助于进一步提升命名实体识别的识别效果。基于此,提出了一种基于双维信息与剪枝的命名实体识别方法,命名为KIWI-Coord-Prune(kiwifruit-CoordKIWINER-PruneBiLSTM)。[方法]通过设计CoordKIWINER与PruneBi-LSTM两个模块,对中文猕猴桃文本中的双维信息进行精准处理。其中CoordKIWINER模块能够显著提升模型捕捉复杂和嵌套实体的能力,从而生成涵盖更多文本信息的加强字符矢量;PruneBi-LSTM模块在上一模块的基础上,加强了模型对重要特征的学习与识别能力,从而进一步提升了实体识别效果。[结果和讨论]在自建数据集KIWIPRO和四个公开数据集人民日报(People's Daily)、ClueNER、Boson,以及ResumeNER上进行试验,并与LSTM、Bi-LSTM、LR-CNN、Softlexicon-LSTM,以及KIWINER五个先进模型进行对比,本研究提出的方法在5个数据集上分别取得了较好的F1值,分别为89.55%、91.02%、83.50%、83.49%和95.81%。[结论]与现有方法相比,本研究提出的方法不仅能够有效提升中文猕猴桃领域文本的命名实体识别效果,且具有一定的泛化性,同时也能够为相关知识图谱和问答系统的构建等下游任务提供技术支持。[Objective]Chinese kiwifruit texts exhibit unique dual-dimensional characteristics.The cross-paragraph dependency is complex semantic structure,whitch makes it challenging to capture the full contextual relationships of entities within a single paragraph,necessitating models capable of robust cross-paragraph semantic extraction to comprehend entity linkages at a global level.However,most existing models rely heavily on local contextual information and struggle to process long-distance dependencies,thereby reducing recognition accuracy.Furthermore,Chinese kiwifruit texts often contain highly nested entities.This nesting and combination increase the complexity of grammatical and semantic relationships,making entity recognition more difficult.To address these challenges,a novel named entity recognition(NER)method,KIWI-Coord-Prune(kiwifruit-CoordKIWINER-PruneBi-LSTM)was proposed in this re‐search,which incorporated dual-dimensional information processing and pruning techniques to improve recognition accuracy.[Methods]The proposed KIWI-Coord-Prune model consisted of a character embedding layer,a CoordKIWINER layer,a PruneBi LSTM layer,a self-attention mechanism,and a CRF decoding layer,enabling effective entity recognition after processing input charac‐ter vectors.The CoordKIWINER and PruneBi-LSTM modules were specifically designed to handle the dual-dimensional features in Chinese kiwifruit texts.The CoordKIWINER module applied adaptive average pooling in two directions on the input feature maps and utilized convolution operations to separate the extracted features into vertical and horizontal branches.The horizontal and vertical features were then independently extracted using the Criss-Cross Attention(CCNet)mechanism and Coordinate Attention(CoordAtt)mechanism,respectively.This module significantly enhanced the model's ability to capture cross-paragraph relationships and nested entity structures,thereby generating enriched character vectors containing more contextual information,which improved the overall repre
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...