融合汉字多级特征与文本局部特征的中文命名实体识别  

Chinese Named Entity Recognition Based on Multi-level Features of Chinese Characters and Local Features of Text

在线阅读下载全文

作  者:张慧[1] 秦董洪[1] 白凤波 罗余特 刘成星 宋蕃桦 ZHANG Hui;QIN Donghong;BAI Fengbo;LUO Yute;LIU Chengxing;SONG Fanhua(College of Artificial Intelligence,Guangxi Minzu University,Nanning,Guangxi 530000,China)

机构地区:[1]广西民族大学人工智能学院,广西南宁530000

出  处:《中文信息学报》2024年第9期93-107,共15页Journal of Chinese Information Processing

基  金:广西科技基地和人才专项(桂科AD23026054);广西壮族自治区中央引导地方科技发展资金项目(桂科ZY24212045)。

摘  要:针对目前中文命名实体识别模型在复杂语境下准确率较低的问题,添加更多汉字特征以弥补词向量表形、表音方面的不足,引入更多先验知识,丰富语义特征;同时设计一种兼顾全局特征与局部特征的编码器,提升模型面对复杂语境时的鲁棒性与泛化性;实验结果表明,该文提出的方法在Weibo、OntoNotes 5.0、Boson、People Daily数据集上F_(1)值分别提升1.61、0.37、0.98、0.98,验证汉字本身特征的重要性与通用性的同时,也验证了文本局部特征有助于提升模型性能。此外,还探究了八种不同汉字编码方式对模型性能的影响,实验证明相比于单个拼音字符,汉字的声母、韵母携带更多发音信息,音调、多音字等特征也有利于提升模型性能;最后,在多种文本实例上测试了模型性能,实验结果表明了该文工作的有效性。To improve the Chinese named entity recognition model,this paper proposes to introduce more Chinese character features to make up for the deficiency of the word vector in character form and pronunciation,and more prior knowledge to enrich the semantic features.It designs a local feature extractor considering both global and local features,so as to improve the robustness and generalization of the model in the face of complex contexts.The influence of eight different Chinese character coding methods is also explored,disclosing that the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are also beneficial to improve the model performance.The experimental results show that the proposed method improves the F_(1) value by 1.61,0.37,0.98 and 0.98 respectively on Weibo,OntoNotes5.0,Boson and People Daily datasets,which proves the importance and universality of Chinese character features,and also proves that local features of text are helpful to improve the model performance.In addition,the influence of eight different Chinese character coding methods on the model performance is also explored.Experimental results show that compared with a single pinyin character,the initials and finals of Chinese characters carry more pronunciation information,and features such as tone and polyphonic characters are also beneficial to improve the model performance.Finally,the performance of the model is tested on a variety of text examples,and the experimental results show the effectiveness of the proposed work.

关 键 词:字形特征 拼音特征 文本局部特征 命名实体识别 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象