检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张慧[1] 秦董洪[1] 白凤波 罗余特 刘成星 宋蕃桦 ZHANG Hui;QIN Donghong;BAI Fengbo;LUO Yute;LIU Chengxing;SONG Fanhua(College of Artificial Intelligence,Guangxi Minzu University,Nanning,Guangxi 530000,China)
机构地区:[1]广西民族大学人工智能学院,广西南宁530000
出 处:《中文信息学报》2024年第9期93-107,共15页Journal of Chinese Information Processing
基 金:广西科技基地和人才专项(桂科AD23026054);广西壮族自治区中央引导地方科技发展资金项目(桂科ZY24212045)。
摘 要:针对目前中文命名实体识别模型在复杂语境下准确率较低的问题,添加更多汉字特征以弥补词向量表形、表音方面的不足,引入更多先验知识,丰富语义特征;同时设计一种兼顾全局特征与局部特征的编码器,提升模型面对复杂语境时的鲁棒性与泛化性;实验结果表明,该文提出的方法在Weibo、OntoNotes 5.0、Boson、People Daily数据集上F_(1)值分别提升1.61、0.37、0.98、0.98,验证汉字本身特征的重要性与通用性的同时,也验证了文本局部特征有助于提升模型性能。此外,还探究了八种不同汉字编码方式对模型性能的影响,实验证明相比于单个拼音字符,汉字的声母、韵母携带更多发音信息,音调、多音字等特征也有利于提升模型性能;最后,在多种文本实例上测试了模型性能,实验结果表明了该文工作的有效性。To improve the Chinese named entity recognition model,this paper proposes to introduce more Chinese character features to make up for the deficiency of the word vector in character form and pronunciation,and more prior knowledge to enrich the semantic features.It designs a local feature extractor considering both global and local features,so as to improve the robustness and generalization of the model in the face of complex contexts.The influence of eight different Chinese character coding methods is also explored,disclosing that the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are also beneficial to improve the model performance.The experimental results show that the proposed method improves the F_(1) value by 1.61,0.37,0.98 and 0.98 respectively on Weibo,OntoNotes5.0,Boson and People Daily datasets,which proves the importance and universality of Chinese character features,and also proves that local features of text are helpful to improve the model performance.In addition,the influence of eight different Chinese character coding methods on the model performance is also explored.Experimental results show that compared with a single pinyin character,the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are also beneficial to improve the model performance.Finally,the performance of the model is tested on a variety of text examples,and the experimental results show the effectiveness of the proposed work.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7